opensoundscape.ml package
Submodules
opensoundscape.ml.cam module
Class activation maps (CAM) for OpenSoundscape models
- class opensoundscape.ml.cam.CAM(base_image, activation_maps=None, gbp_maps=None)[source]
Bases:
objectObject to hold and view Class Activation Maps, including guided backprop
Stores activation maps as .activation_maps, and guided backprop as .gbp_cams
each is a Series indexed by class
- create_rgb_heatmaps(class_subset=None, mode='activation', show_base=True, alpha=0.5, color_cycle=('#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'), gbp_normalization_q=99)[source]
create rgb numpy array of heatmaps overlaid on the sample
Can choose a subset of classes and activation/backprop modes
- Parameters:
class_subset – iterable of classes to visualize with activation maps - default None plots all classes - each item must be in the index of self.gbp_map / self.activation_maps - note that a class None is created by cnn.generate_cams() when classes are not specified during CNN.generate_cams()
mode – str selecting which maps to visualize, one of: ‘activation’ [default]: overlay activation map ‘backprop’: overlay guided back propogation result ‘backprop_and_activation’: overlay product of both maps None: do not overlay anything on the original sample
show_base – if False, does not plot the image of the original sample [default: True]
alpha – opacity of the activation map overlap [default: 0.5]
color_cycle – iterable of colors activation maps - cycles through the list using one color per class
gbp_normalization_q – guided backprop is normalized such that the q’th percentile of the map is 1. [default: 99]. This helps avoid gbp maps that are too dark to see. Lower values make brighter and noiser maps, higher values make darker and smoother maps.
- Returns:
numpy array of shape [w, h, 3] representing the image with CAM heatmaps if mode is None, returns the original sample if show_base is False, returns just the heatmaps if mode is None _and_ show_base is False, returns None
- plot(class_subset=None, mode='activation', show_base=True, alpha=0.5, color_cycle=('#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'), figsize=None, plt_show=True, save_path=None, gbp_normalization_q=99, flipud=False)[source]
Plot per-class activation maps, guided back propogations, or their products
Do not pass both mode=None and show_base=False.
- Parameters:
class_subset – iterable of classes to visualize with activation maps - default None plots all classes - each item must be in the index of self.gbp_map / self.activation_maps - note that a class None is created by cnn.generate_cams() when classes are not specified during CNN.generate_cams()
mode – str selecting which maps to visualize, one of: ‘activation’ [default]: overlay activation map ‘backprop’: overlay guided back propogation result ‘backprop_and_activation’: overlay product of both maps None: do not overlay anything on the original sample
show_base – if False, does not plot the image of the original sample [default: True]
alpha – opacity of the activation map overlap [default: 0.5]
color_cycle – iterable of colors activation maps - cycles through the list using one color per class
gbp_normalization_q – guided backprop is normalized such that the q’th percentile of the map is 1. [default: 99]. This helps avoid gbp maps that are too dark to see. Lower values make brighter and noiser maps, higher values make darker and smoother maps.
figsize – the figure size for the plot [default: None]
plt_show – if True, runs plt.show() [default: True] - ignored if return_numpy=True
save_path – path to save image to [default: None does not save file]
flipud – if True, flips the image vertically before plotting [default: False]
- Returns:
(fig, ax) of matplotlib figure, or np.array if return_numpy=True
Note: if base_image does not have 3 channels, channels are averaged then copied across 3 RGB channels to create a greyscale image
- Note 2: If return_numpy is true, fig and ax are never created, it simply creates
a numpy array representing the image with the CAMs overlaid and returns it
opensoundscape.ml.cnn module
classes for pytorch machine learning models in opensoundscape
For tutorials, see notebooks on opensoundscape.org
- opensoundscape.ml.cnn.CNN[source]
alias of
SpectrogramClassifier
- class opensoundscape.ml.cnn.SpectrogramClassifier(architecture, classes, sample_duration, sample_rate, single_target=False, preprocessor_dict=None, preprocessor_cls=<class 'opensoundscape.preprocess.preprocessors.SpectrogramPreprocessor'>, device=None, **preprocessor_kwargs)[source]
Bases:
SpectrogramModuledefines pure pytorch train, predict, and eval methods for a spectrogram classifier
- batch_forward(batch_samples, targets=None, avgpool=True)[source]
Forward pass for a batch of data
- Parameters:
batch_samples – a batch of samples from a dataloader
targets – list of layers from self.network to extract outputs from The key self.class_outputs_key (-1 by default) corresponds to final model output. If None, only returns final model output.
avgpool – bool, if True, applies global average pooling to intermediate outputs (average across all dimensions except first to get
- Returns:
dictionary with key for each output request in targets Key matching self.class_outputs_key corresponds to final model output.
- current_step
track number of complete training steps
- property device
- early_stopping_config
Early stopping configuration dictionary.
Early stopping halts training if the validation score does not improve for a specified number of steps (patience).
The metric monitored for improvement is defined by self.score_metric, but adjust “mode” according to whether the score should be minimized (loss) or maximized (accuracy, f1, auroc, avg precision, etc).
To enable early stopping, set self.early_stopping_config[‘enabled’]=True and modify other parameters as desired.
‘patience’: number of steps with no improvement before stopping ‘min_delta’: minimum change in the monitored quantity to qualify as an improvement ‘mode’: ‘max’ or ‘min’, whether to look for maximum (eg accuracy) or minimum (eg loss) of the monitored quantity
- embed(samples, batch_size=1, num_workers=0, target_layer=None, progress_bar=True, return_preds=False, avgpool=True, return_dfs=True, audio_root=None, output_size_warning=1000000000.0, **dataloader_kwargs)[source]
Generate embeddings (intermediate layer outputs) for audio files/clips
Note: to capture embeddings on multiple layers, use self.__call__ with intermediate_layers argument directly. This wrapper only allows one target_layer.
Note: Output can be n-dimensional array (return_dfs=False) or pd.DataFrame with multi-index like .predict() (return_dfs=True). If avgpool=False, return_dfs is forced to False since we can’t create a DataFrame with >2 dimensions.
For advanced use cases (e.g. multiple target layers), use self.__call__() directly.
- Parameters:
samples – same as CNN.predict(): file path, list of file paths, OR pd.DataFrame with index containing audio file paths, OR a pd.DataFrame with multi-index (file, start_time, end_time)
batch_size – batch size to use for dataloader [default: 1]
num_workers – number of parallel CPU workers to use for dataloader [default: 0]
target_layer – layer from self.model._modules to extract outputs from - if None, attempts to use self.model.embedding_layer as default
progress_bar – bool, if True, shows a progress bar with tqdm [default: True]
return_preds – bool, if True, returns two outputs (embeddings, logits)
avgpool – bool, if True, applies global average pooling to intermediate outputs i.e. averages across all dimensions except first to get a 1D vector per sample
return_dfs – bool, if True, returns embeddings as pd.DataFrame with multi-index like .predict(). if False, returns np.array of embeddings [default: True]. If avg_pool=False, overrides to return np.array since we can’t have a df with >2 dimensions
audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files
self.predict_dataloader() (dataloader_kwargs are passed to)
- Returns: (embeddings, preds) if return_preds=True or embeddings if return_preds=False
types are pd.DataFrame if return_dfs=True, or np.array if return_dfs=False
- embed_to_hoplite_db(samples, db, deployment, project=None, file_to_datetime=None, target_layer=None, wandb_session=None, progress_bar=True, audio_root=None, embedding_exists_mode='skip', commit_frequency_batches=100, overflow_mode='warn', embedding_dim=None, strict_matching=False, **dataloader_kwargs)[source]
Run inference on a dataloader, saving 1D outputs of target_layer to a hoplite database
Note that all samples are associated with a single deployment (e.g. one audio recorder on one season) Call this method separately for each deployment to associate samples with different deployments in the database
- Parameters:
samples – (same as CNN.predict())
db – a hoplite database object or a path to a hoplite database folder - if a path is provided, the database will be created if it does not exist - when creating a new db, the embedding_dim argument must be provided
deployment –
name of deployment (ie one recorder deployed once) to associate embeddings with - if deployment does not exist in db, it will be created - if you wish to include metadata per deployment (eg lat, lon, point name), first
add the deployment to the db using perch_hoplite.db.interface.HopliteDB.insert_deployment()
project – optional project name to associate deployment with
file_to_datetime –
optional function or dictionary mapping filenames to datetime objects - used to set recording start times in the database - if None, recording start times will not be set - if a function is provided, it should take a single argument (filename: str)
and return a datetime.datetime object
- if a dictionary is provided, it should map filenames (str) to
datetime.datetime objects
target_layer –
layer to extract embeddings from if None [default], attempts to use architecture’s default target_layer Note: only architectures created with opensoundscape 0.9.0+ will have a default target layer. See pytorch_grad_cam docs for suggestions. Note: if multiple layers are provided, the activations are merged across
layers (rather than returning separate activations per layer)
wandb_session – a wandb session to log progress to (e.g. return value of wandb.init())
progress_bar – bool, if True, shows a progress bar with tqdm [default: True]
audio_root – the root directory for relative paths to audio files
embedding_exists_mode –
str, behavior when an embedding already exists for a given embedding. Options are: #TODO implement replace
”skip”: skip inserting the embedding (default) “error”: raise an error “add”: add a new embedding entry to the db with the same source info
- Note: the strict_matching argument affects whether existing embeddings are only
matched within a deployment/project or across all deployments/projects.
Note that hoplite doesn’t currently support removing or replacing existing entries
commit_frequency_batches – int, commit to db after every N batches[default: 1]
overflow_mode – ‘warn’, ‘error’, or ‘ignore’ behvior when embedding values exceed the range of float16, which is the range of values allowed in hoplite db
embedding_dim – int, dimension of the embeddings to be stored - only used when creating a new hoplite db - must match the output dimension of the model’s target_layer - if creating new db and embedding_dim is None, guesses based on self.classifier.in_features
strict_matching –
bool, select strategy for matching existing deployments and embeddings - if True, deployments are only considered matching if both deployment name and project match;
embeddings are only considered matching if project, deployment, source_id, and offset all match
- if False [default], deployments from any project are matched by name only;
embeddings are matched across all deployments and projects by source_id and offset only
**dataloader_kwargs – additional keyword arguments to pass to the dataloader
- Returns:
(embedding_db, dict with info about inserted window_id’s and failed samples)
- Effects:
Inserts embeddings into the provided hoplite database Adds deployment and recording entries to db as needed
- eval(targets=None, scores=None, reset_metrics=True)[source]
compute single-target or multi-target metrics from targets and scores
Or, compute metrics on accumulated values in the TorchMetrics if targets is None
By default, the overall model score is “map” (mean average precision) for multi-target models (self.single_target=False) and “f1” (average of f1 score across classes) for single-target models).
update self.torch_metrics to include the desired metrics
- Parameters:
targets – 0/1 for each sample and each class
None (- if targets is) – (using accumulated values)
self.torch_metrics (runs metric.compute() on each of) – (using accumulated values)
scores – continuous values in 0/1 for each sample and class
None
ignored (this is)
reset_metrics – if True, resets the metrics after computing them [default: True]
- Returns:
value)
- Return type:
dictionary of metrics (name
- Raises:
AssertionError – if targets are outside of range [0,1]
- generate_cams(samples, method='gradcam', classes=None, target_layers=None, guided_backprop=False, progress_bar=True, audio_root=None, **dataloader_kwargs)[source]
Generate a activation and/or backprop heatmaps for each sample
- Parameters:
samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a file path (str or Path) to a single audio file, OR - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths
method –
method to use for activation map. Can be str (choose from below) or a class of pytorch_grad_cam (any subclass of BaseCAM), or None if None, activation maps will not be created [default:’gradcam’]
- str can be any of the following:
”gradcam”: pytorch_grad_cam.GradCAM, “hirescam”: pytorch_grad_cam.HiResCAM, “scorecam”: pytorch_grad_cam.ScoreCAM, “gradcam++”: pytorch_grad_cam.GradCAMPlusPlus, “ablationcam”: pytorch_grad_cam.AblationCAM, “xgradcam”: pytorch_grad_cam.XGradCAM, “eigencam”: pytorch_grad_cam.EigenCAM, “eigengradcam”: pytorch_grad_cam.EigenGradCAM, “layercam”: pytorch_grad_cam.LayerCAM, “fullgrad”: pytorch_grad_cam.FullGrad, “gradcamelementwise”: pytorch_grad_cam.GradCAMElementWise,
classes (list) – list of classes, will create maps for each class [default: None] if None, creates an activation map for the highest scoring class on a sample-by-sample basis
target_layers (list) –
list of target layers for GradCAM - if None [default] attempts to use architecture’s default target_layer Note: only architectures created with opensoundscape 0.9.0+ will have a default target layer. See pytorch_grad_cam docs for suggestions. Note: if multiple layers are provided, the activations are merged across
layers (rather than returning separate activations per layer)
guided_backprop – bool [default: False] if True, performs guided backpropagation for each class in classes. AudioSamples will have attribute .gbp_maps, a pd.Series indexed by class name
audio_root – str or Path, root directory to prepend to audio file paths in samples, if samples do not contain full paths. [default: None]
SafeAudioDataloader (**kwargs are passed to) – (incl: batch_size, num_workers, split_file_into_clips, bypass_augmentations, invalid_sample_behavior, overlap_fraction, final_clip, other DataLoader args)
- Returns:
a list of AudioSample objects with .cam attribute, an instance of the CAM class ( visualize with sample.cam.plot()). See the CAM class for more details
See pytorch_grad_cam documentation for references to the source of each method.
- generate_samples(samples, invalid_samples_log=None, return_invalid_samples=False, audio_root=None, **dataloader_kwargs)[source]
Generate AudioSample objects. Input options same as .predict()
- Parameters:
samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path as str or pathlib.Path
args (see .predict() documentation for other)
**dataloader_kwargs – any arguments to inference_dataloader_cls.__init__ except samples (uses samples) and collate_fn (uses identity) (Note: default class is SafeAudioDataloader)
- Returns:
a list of AudioSample objects - if return_invalid_samples is True, returns second value: list of paths to samples that failed to preprocess
Example:
` from opensoundscappe.preprocess.utils import show_tensor_grid samples = generate_samples(['/path/file1.wav','/path/file2.wav']) tensors = [s.data for s in samples] show_tensor_grid(tensors,columns=3) `
- classmethod load(path, unpickle=True)[source]
load a model saved using CNN.save()
- Parameters:
path – path to file saved using CNN.save()
unpickle – if True, passes weights_only=False to torch.load(). This is necessary if the model was saved with pickle=True, which saves the entire model object. If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]
- Returns:
new CNN instance
Note: Note that if you used pickle=True when saving, the model object might not load properly across different versions of OpenSoundscape.
- load_weights(path, strict=True)[source]
load network weights state dict from a file
For instance, load weights saved with .save_weights() in-place operation
- Parameters:
path – file path with saved weights
strict – (bool) see torch.load()
- log_file
specify a path to save output to a text file
- logging_level
amount of logging to self.log_file. 0 for nothing, 1,2,3 for increasing logged info
- loss_hist
list of batch loss values during training
- name = 'SpectrogramClassifier'
- per_class_metrics(targets, scores)[source]
compute per-class metrics: au_roc, avg precision
can override this method to customize per-class metrics
- Parameters:
targets – 2d array of 0/1 for each sample and each class
scores – 2d array of continuous valued score for each sample and class
- Returns:
- dictionary of per-class metrics
{class_name: {metric_name: value}}
- predict(samples, batch_size=1, num_workers=0, activation_layer=None, clip_overlap=None, overlap_fraction=None, clip_step=None, final_clip='extend', bypass_augmentations=True, invalid_samples_log=None, raise_errors=False, wandb_session=None, return_invalid_samples=False, progress_bar=True, audio_root=None, output_size_warning=1000000000.0, **dataloader_kwargs)[source]
Generate predictions on a set of samples
Return dataframe of model output scores for each sample. Optional activation layer for scores (softmax, sigmoid, softmax then logit, or None)
- Parameters:
samples – the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path (str or pathlib.Path)
batch_size – Number of files to load simultaneously [default: 1]
num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]
activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw logit scores [-inf:inf] - ‘softmax’: scores all classes sum to 1, scores between 0 and 1 - ‘sigmoid’: each class is independent, scores between 0 and 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]
overlap_fraction – see opensoundscape.utils.generate_clip_times_df
clip_overlap – see opensoundscape.utils.generate_clip_times_df
clip_step – see opensoundscape.utils.generate_clip_times_df
final_clip – see opensoundscape.utils.generate_clip_times_df
bypass_augmentations – If False, Actions with is_augmentation==True are performed. Default True.
invalid_samples_log – if not None, samples that failed to preprocess will be listed in this text file.
raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log
wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb
return_invalid_samples – bool, if True, returns second argument, a set containing file paths of samples that caused errors during preprocessing [default: False]
progress_bar – bool, if True, shows a progress bar with tqdm [default: True]
audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files
output_size_warning – int, if >0, raises a warning if the number of output scores (clips * classes) exceeds this number, as this can cause heavy memory usage. Set to None or 0 to disable. [default: 1e9]
**dataloader_kwargs – additional arguments to self.predict_dataloader()
- Returns:
df of post-activation_layer scores - if return_invalid_samples is True, returns (df,invalid_samples) where invalid_samples is a set of file paths that failed to preprocess
- Effects:
(1) wandb logging If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of samples is preprocessed and logged to a table. Progress over all batches is logged. Afte prediction, top scoring samples are logged. Use self.wandb_logging dictionary to change the number of samples logged or which classes have top-scoring samples logged.
(2) unsafe sample logging If unsafe_samples_log is not None, saves a list of all file paths that failed to preprocess in unsafe_samples_log as a text file
- Note: if loading an audio file raises a PreprocessingError, the scores
for that sample will be np.nan
- profile(samples, batch_size=1, num_workers=0, forward=True, backward=True, bypass_augmentations=False, **dataloader_kwargs)[source]
Profile the model preprocessing, forward, and backward speeds on a set of samples
- Parameters:
samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a file path (str or Path) to a single audio file, OR - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths
batch_size – number of samples to process simultaneously
num_workers – number of parallel CPU tasks for preprocessing
forward – bool, if True, profiles forward pass time [default: True]
backward – bool, if True, profiles backward pass time [default: True]
bypass_augmentations – bool, if True, bypasses data augmentations during preprocessing [default: False]
**dataloader_kwargs – additional keyword arguments to pass to the dataloader
- Returns:
- breakdown of time spent on each preprocessing step
(measured for one sample)
preprocessing time per batch and per sample (seconds)
If forward=True: - forward pass time per batch and per sample (seconds) If backward=True: - backward pass time per batch and per sample (seconds)
- Return type:
a dictionary with timing information for
Example:
`python m=opso.CNN('resnet18',[0,1],1,32000) # m.device='cpu' # optionally set a specific device m.network.to(m.device) samples = opso.utils.make_clip_df([opso.birds_path]*10,clip_duration=1) results_dict = m.profile(samples,batch_size=32,num_workers=0) `
- run_evaluation(validation_df, progress_bar=True, **kwargs)[source]
Generate predictions on labeled data and compute evaluation metrics
override this to customize the validation step during training eg, could run validation on multiple datasets and save performance of each in self.valid_metrics[current_epoch][validation_dataset_name]
- Parameters:
validation_df – dataframe of validation samples
progress_bar – if True, show a progress bar with tqdm
**kwargs – passed to self.predict_dataloader()
- Returns:
dictionary of evaluation metrics calculated with self.torch_metrics
- Return type:
metrics
- Effects:
updates self.valid_metrics[current_epoch] with metrics for the current epoch
- property sample_duration
- property sample_rate
- save(path, save_hooks=False, pickle=False, error='raise')[source]
save model with weights using torch.save()
load from saved file with cnn.load_model(path)
- Parameters:
path – file path for saved model object
save_hooks – retain forward and backward hooks on modules [default: False] Note: True can cause issues when using wandb.watch()
pickle – if True, saves the entire model object using torch.save() Note: if using pickle=True, entire object is pickled, which means that saving and loading model objects across OpenSoundscape versions might not work properly. pickle=True is useful for resuming training, because it retains the state of the optimizer, scheduler, loss function, etc pickle=False is recommended for saving models for inference/deployment/sharing [default: False]
error – behavior if saving with pickle fails - “raise”: raise RuntimeError - “warn”: issue a warning and save unpickled model instead - “ignore”: no action (model not saved) [default: “raise”]
- save_onnx(path, activation_layer=None, include_preprocessor_output=True, include_embedding_output=True, include_classifier_output=True, **kwargs)[source]
Export the model to ONNX format
The preprocessor must be a TorchSpectrogramPreprocessor with torch.nn.Modules in preprocessor.pipeline[‘transform’].transforms (see example below)
See also: to_onnx_model() to create opensoundscape.ONNXModel for inference
Requires that onnx, onnxruntime, and onnxscript are packages are installed
- Parameters:
path – file path to save the ONNX model pass None to return an in-memory torch.onnx.ONNXProgram object without saving to disk
activation_layer – if provided, applies an activation layer to classifier outputs options: ‘softmax’, ‘sigmoid’, or None [default: None]
include_preprocessor_output – if True, includes the output of the preprocessor in the ONNX model outputs as key “preprocessor” [default: True]
include_embedding_output – if True, includes the output of the embedding layer in the ONNX model outputs as key “embedding” [default: True]
include_classifier_output – if True, includes the output of the classifier in the ONNX model outputs as key “classifier” [default: True]
**kwargs – additional keyword arguments passed to opensoundscape.ml.export.to_onnx_program()
- Returns:
a torch.onnx.ONNXProgram object
- Return type:
onnx_program
Example:
Exporting an EfficientNet model to ONNX format: ```python from opensoundscape import CNN, preprocessors
- model = CNN(
architecture=”efficientnet_b0”, classes=[0, 1, 2, 3], sample_duration=3, preprocessor_cls=preprocessors.TorchSpectrogramPreprocessor, sample_rate=32000,
) onnx_program = model.save_onnx(“./opso_efficientnet.onnx”) ```
Using the saved model for inference with onnx runtime:
```python import onnx, onnxruntime import numpy as np
combined_model = onnx.load(“opso_efficientnet.onnx”) output_names = [node.name for node in combined_model.graph.output]
onnx.checker.check_model(combined_model)
EP_list = [“CPUExecutionProvider”] # [“CUDAExecutionProvider”, “CPUExecutionProvider”] ort_session = onnxruntime.InferenceSession(“opso_efficientnet.onnx”, providers=EP_list)
# make up some random inputs audio_samples_per_input = (
combined_model.graph.input[0].type.tensor_type.shape.dim[2].dim_value
) batch_size = 3 input_batched = np.random.rand(batch_size, 1, audio_samples_per_input).astype(
np.float32
)
# compute ONNX Runtime output prediction ort_inputs = {ort_session.get_inputs()[0].name: input_batched} ort_outs = ort_session.run(None, ort_inputs)
# restore the name-value dictionary mapping of outputs outs_dict = {name: ort_outs[i] for i, name in enumerate(output_names)} print(f”shape of outputs for inference on one batch of batch size {batch_size}:”) print({k: v.shape for k, v in outs_dict.items()}) ```
Example 2: Exporting a model with customized preprocessing transforms
```python from opensoundscape import CNN, preprocessors
- model = CNN(
architecture=”efficientnet_b0”, classes=[0, 1, 2, 3], sample_duration=3, preprocessor_cls=preprocessors.TorchSpectrogramPreprocessor, sample_rate=32000, bandpass_range=(3000, 10000), lower_dB_range=-30, rescale_mean_sd=(-30, 20), spec_nfft=512, spec_window_length=512, spec_hop_length=128, # resize_ft=(200, 512), # using resize_ft breaks serialization for json save/load! n_mels=64,
) onnx_program = model.save_onnx(“./opso_efficientnet_melspec.onnx”) ```
Example 3: Writing a custom list of preprocessing transforms
```python import torchaudio from opensoundscape import CNN, preprocessors model = CNN(“resnet18”, classes=[0], sample_duration=5, sample_rate=32000) # custom list of torchaudio and torchvision transforms my_transforms = [
- torchaudio.transforms.Spectrogram(
n_fft=512, win_length=512, hop_length=128, center=False, #highly recommended because default=True will zero-pad, creating extra columns
), torchaudio.transforms.AmplitudeToDB(top_db=80),
] model.preprocessor = preprocessors.TorchSpectrogramPreprocessor(
sample_rate=32000, sample_duration=model.preprocessor.sample_duration, torch_transforms=my_transforms,
) onnx_program = model.save_onnx(“./opso_efficientnet_custom.onnx”) ```
- save_weights(path)[source]
save just the weights of the network
This allows the saved weights to be used more flexibly than model.save() which will pickle the entire object. The weights are saved in a pickled dictionary using torch.save(self.network.state_dict())
- Parameters:
path – location to save weights file
- similarity_search_hoplite_db(query_samples, db, num_results=5, exact_search=False, search_subset_size=None, target_score=None, audio_root=None, search_kwargs=None, **embedding_kwargs)[source]
Perform a similarity search in the Hoplite database.
- Parameters:
query_samples – audio examples for which to find most similar examples file path, list of paths, or dataframe with file,start_time,end_time multi-index
db – a Hoplite database containing embeddings from the same model
num_results – The number of results to return for each query
exact_search – default False for usearch (faster), if True uses brute force search
search_subset_size – Number of embeddings to compare with. If None, all embeddings are used. For floats between 0 and 1, sample a proportion of the database. For ints, sample the specified number of embeddings. if None [default], searches all embeddings Note: only implemented for exact_search=True
target_score – if specified, searches for similarity scores close to target_score default [None] searches for most similar embeddings
audio_root – root directory for relative paths to query audio files
search_kwargs – dict of additional keyword arguments passed to db.ui.search() or brutalism.threaded_brute_search() if exact_search=True exact_search=False: radius, threads, exact, log, progress exact_search=True: batch_size, max_workers, rng_seed
**embedding_kwargs – additional keyword arguments passed to self.embed(), such as batch_size and num_workers
- Returns:
query_file, query_start_time, query_end_time: the query sample info
file, window_id: the matched sample filepath and window_id from the database
start_time, end_time: the matched sample start and end time (relative to file) from the database
sort_score: the similarity score between the query and matched sample
- Return type:
A dataframe with the search results, including columns
- train(train_df, validation_df=None, steps=1000, batch_size=64, num_workers=0, save_path='.', save_interval=-1, log_interval=50, validation_interval=100, reset_optimizer=False, restart_scheduler=False, invalid_samples_log='./invalid_training_samples.log', raise_errors=False, wandb_session=None, progress_bar=True, audio_root=None, reload_best_at_end=True, **dataloader_kwargs)[source]
train the model on samples from train_dataset
If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().
- Parameters:
train_df – a dataframe of files and labels for training the model - either has index file or multi-index (file,start_time,end_time)
validation_df – a dataframe of files and labels for evaluating the model [default: None means no validation is performed]
steps – number of steps (ie batches or updates) to train the model for
batch_size – number of training files simultaneously passed through forward pass, loss function, and backpropagation
num_workers – number of parallel CPU tasks for preprocessing Note: use 0 for single (root) process (not 1)
save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]
save_interval – interval in steps to save model object with weights Note: the best model is always saved to best.model [default:-1] means only save best.model and last.model in addition to other saved steps.
log_interval – interval in batches to print training loss/metrics
validation_interval – interval in steps to test the model on the validation set Note that model will only update it’s best score and save best.model file on steps that it performs validation.
reset_optimizer – if True, resets the optimizer rather than retaining state_dict of self.optimizer [default: False]
restart_scheduler – if True, resets the learning rate scheduler rather than retaining state_dict of self.scheduler [default: False]
invalid_samples_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file
raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log
wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb For example:
` import wandb wandb.login(key=api_key) #find your api_key at https://wandb.ai/settings session = wandb.init(enitity='mygroup',project='project1',name='first_run') ... model.train(...,wandb_session=session) session.finish() `audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files
progress_bar – bool, if True, shows a progress bar with tqdm [default: True]
reload_best_at_end – if True, after training completes, reloads the best model weights into self.network [default: True] Best model is determined by validation set’s self.score_metric score
**dataloader_kwargs – additional arguments passed to train_dataloader()
- Effects:
If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of training and validation samples are preprocessed and logged to a table. Training progress, loss, and metrics are also logged. Use self.wandb_logging dictionary to change the number of samples logged.
- verbose
amount of logging to stdout. 0 for nothing, 1,2,3 for increasing printed output
- class opensoundscape.ml.cnn.SpectrogramModule(architecture, classes, sample_duration, sample_rate, single_target=False, preprocessor_dict=None, preprocessor_cls=<class 'opensoundscape.preprocess.preprocessors.SpectrogramPreprocessor'>, arch_weights='DEFAULT', **preprocessor_kwargs)[source]
Bases:
BaseModuleParent class for SpectrogramClassifier (pytorch) and LightningSpectrogramModule (lightning)
implements functionality that is shared between both pure PyTorch and Lightning classes/workflows
- change_classes(new_classes, hidden_layers=None)[source]
change the classes that the model predicts
replaces the network’s final linear classifier layer with a new layer (or MLP, if hidden_layers is not None) initialized with random weights and the correct number of output features
Supports torch.nn.Linear and opensoundscape.ml.shallow_classifier.MLPClassifier as the classifier layer to update. Will raise an error if self.network.classifier_layer is a different type
- Parameters:
new_classes – list of class names
hidden_layers –
list of hidden layer sizes for the new classifier - None: creates a single torch.nn.Linear layer - (int, …): creates an MLPClassifier object with hidden layers
of the specified sizes; eg (100, 50) creates 2 hidden layers with 100 and 50 neurons, respectively.
(): empty tuple creates an MLPClassifier with no hidden layers
- change_classifier(new_classifier, classes=None)[source]
Replaces the classifier layer
replaces the network’s final linear classifier layer with a new classifier
- Parameters:
new_classifier – the new classifier to replace the existing one typically, torch.nn.Linear or opensoundscape.ml.shallow_classifier.MLPClassifier object
classes – optional list of class names to set for the new classifier; if None, will attempt to copy from new_classifier.classes attribute
- property classifier
return the classifier layer of the network, based on .network.classifier_layer string
- compute_per_class_metrics
if True, compute and log per-class metrics during training/validation
- freeze_feature_extractor()[source]
freeze all layers except self.classifier
prepares the model for transfer learning where only the classifier is trained
uses the attribute self.network.classifier_layer (via the .classifier attribute) to identify the classifier layer
if this is not set will raise Exception - use freeze_layers_except() instead
- freeze_layers_except(train_layers=None)[source]
Freeze all parameters of a model except the parameters in the target_layer(s)
Freezing parameters means that the optimizer will not update the weights
Modifies the model in place!
- Parameters:
model – the model to freeze the parameters of
train_layers – layer or list/iterable of the layers whose parameters should not be frozen For example: pass model.classifier to train only the classifier
Example 1:
` freeze_all_layers_except(model, model.classifier) `Example 2: freeze all but 2 layers
` freeze_all_layers_except(model, [model.layer1, model.layer2]) `
- lr_scheduler_step
track number of calls to lr_scheduler.step()
set to -1 to restart learning rate schedule from initial lr
this value is used to initialize the lr_scheduler’s last_epoch parameter it is tracked separately from self.current_step because the lr_scheduler might be stepped once per epoch, per step, or at other intervals
Note that the initial learning rate is set via self.optimizer_params[‘kwargs’][‘lr’]
- network
a pytorch Module such as Resnet18 or a custom object
for convenience, __init__ also allows user to provide string matching a key from opensoundscape.ml.cnn_architectures.ARCH_DICT.
List options: opensoundscape.ml.cnn_architectures.list_architectures()
- property single_target
- opensoundscape.ml.cnn.list_model_classes()[source]
return list of available action function keyword strings (can be used to initialize Action class)
- opensoundscape.ml.cnn.load_model(path, device=None, unpickle=True)[source]
load a saved model object
This function handles models saved either as pickled objects or as a dictionary including weights, preprocessing parameters, architecture name, etc.
Note that pickled objects may not load properly across different versions of OpenSoundscape, while the dictionary format does not retain the full training state for resuming model training.
- Parameters:
path – file path of saved model
device – which device to load into, eg ‘cuda:1’ [default: None] will choose first gpu if available, otherwise cpu
unpickle – if True, passes weights_only=False to torch.load(). This is necessary if the
with`pickle=True` (model was saved) – If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]
object. (which saves the entire model) – If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]
- Returns:
a model object with loaded weights
- opensoundscape.ml.cnn.register_model_cls(model_cls)[source]
add class to MODEL_CLS_DICT
this allows us to recreate the class when loading saved model file with load_model()
- opensoundscape.ml.cnn.use_resample_loss(model, train_df)[source]
Modify a model to use ResampleLoss for multi-target training
ResampleLoss may perform better than BCE Loss for multitarget problems in some scenarios.
- Parameters:
model – CNN object
train_df – dataframe of labels, used to calculate class frequency
opensoundscape.ml.cnn_architectures module
Module to initialize PyTorch CNN architectures with custom output shape
This module allows the use of several built-in CNN architectures from PyTorch. The architecture refers to the specific layers and layer input/output shapes (including convolution sizes and strides, etc) - such as the ResNet18 or EfficientNet B0 architecture.
We provide wrappers which modify the output layer to the desired shape (to match the number of classes). The way to change the output layer shape depends on the architecture, which is why we need a wrapper for each one. This code is based on pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
To use these wrappers, for example, if your model has 10 output classes, write
my_arch=resnet18(10)
Then you can initialize a model object from opensoundscape.ml.cnn with your architecture:
model=CNN(my_arch,classes,sample_duration, sample_rate)
or override an existing model’s architecture:
model.network = my_arch
- opensoundscape.ml.cnn_architectures.alexnet(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for AlexNet architecture
input size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.change_conv2d_channels(conv2d, num_channels=3, reuse_weights=True)[source]
Modify the number of input channels for a pytorch CNN
This function changes the input shape of a torch.nn.Conv2D layer to accommodate a different number of channels. It attempts to retain weights in the following manner: - If num_channels is less than the original, it will average weights across the original channels and apply them to all new channels. - if num_channels is greater than the original, it will cycle through the original channels, copying them to the new channels
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
num_channels – specify channels in input sample, eg [channels h,w] sample shape
reuse_weights – if True (default), averages (if num_channels<original)
through (or cycles) – and adds them to the new Conv2D
- opensoundscape.ml.cnn_architectures.change_fc_output_size(fc, num_classes)[source]
Modify the number of output nodes of a fully connected layer
- Parameters:
fc – the fully connected layer of the model that should be modified
num_classes – number of output nodes for the new fc
- opensoundscape.ml.cnn_architectures.densenet121(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for densenet121 architecture
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.efficientnet_b0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for efficientnet_b0 architecture
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- Note: in v0.10.2, changed from using NVIDIA/DeepLearningExamples:torchhub repo
implementatiuon to native pytorch implementation
- opensoundscape.ml.cnn_architectures.efficientnet_b1(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for efficientnet_b1 architecture
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.efficientnet_b4(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for efficientnet_b4 architecture
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- Note: in v0.10.2, changed from using NVIDIA/DeepLearningExamples:torchhub repo
implementatiuon to native pytorch implementation
- opensoundscape.ml.cnn_architectures.freeze_params(model)[source]
disable gradient updates for all model parameters
- opensoundscape.ml.cnn_architectures.generic_make_arch(constructor, weights, num_classes, embed_layer, cam_layer, name, input_conv2d_layer, linear_clf_layer, freeze_feature_extractor=False, num_channels=3)[source]
construct a CNN architecture, then adapt the input channels and output layer according to channels and num_classes arguments
works when first layer is conv2d and last layer is fully-connected Linear
input_size = 224
- Parameters:
constructor – function that creates a torch.nn.Module and takes weights argument
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html Passed to constructor()
num_classes – number of output nodes for the final layer
embed_layer – specify which layers outputs should be accessed for “embeddings”
cam_layer – specify a default layer for GradCAM/etc visualizations
name – name of the architecture, used for the constructor_name attribute to re-load from saved version
input_conv2d_layer – name of first Conv2D layer that can be accessed with .get_submodule() string formatted as .-delimited list of attribute names or list indices, e.g. “features.0”
linear_clf_layer – name of final Linear classification fc layer that can be accessed with .get_submodule() string formatted as .-delimited list of attribute names or list indices, e.g. “classifier.0.fc”
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.list_architectures()[source]
return list of available architecture keyword strings
- opensoundscape.ml.cnn_architectures.resnet101(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for ResNet101 architecture
input_size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.resnet152(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for ResNet152 architecture
input_size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.resnet18(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for ResNet18 architecture
input_size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.resnet34(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for ResNet34 architecture
input_size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.resnet50(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for ResNet50 architecture
input_size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.set_layer_from_name(module, layer_name, new_layer)[source]
assign an attribute of an object using a string name
- Parameters:
module – object to assign attribute to
layer_name – string name of attribute to assign the attribute_name is formatted with . delimiter and can contain either attribute names or list indices e.g. “network.classifier.0.0.fc” sets network.classifier[0][0].fc this type of string is given by torch.nn.Module.named_modules()
new_layer – replace layer with this torch.nn.Module instance
also (see) – torch.nn.Module.named_modules(), torch.nn.Module.get_submodule()
- opensoundscape.ml.cnn_architectures.squeezenet1_0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for squeezenet architecture
input size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
num_channels – specify channels in input sample, eg [channels h,w] sample shape
- opensoundscape.ml.cnn_architectures.unfreeze_params(model)[source]
enable gradient updates for all model parameters
- opensoundscape.ml.cnn_architectures.vgg11_bn(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]
Wrapper for vgg11 architecture
input size = 224
- Parameters:
num_classes – number of output nodes for the final layer
freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained
weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html
opensoundscape.ml.dataloaders module
- class opensoundscape.ml.dataloaders.SafeAudioDataloader(*args: Any, **kwargs: Any)[source]
Bases:
DataLoaderCreate DataLoader for inference, wrapping a SafeDataset
SafeDataset contains AudioFileDataset or AudioSampleDataset depending on sample type
During inference, we allow the user to pass any of these formatas for samples: - list of file paths - Dataframe with file as index - Dataframe with (file, start_time, end_time) of clips as MultiIndex - Dataframe with (file, start_time, end_time) as columns - Dataframe with (file, start_time) as column - Dataframe with (file) as column - CategoricalLabels object
If start_times are not specified, it will automatically determine the number of clips that can be created from the file (with overlap between subsequent clips based on overlap_fraction)
- Parameters:
samples – any of the following: - list of file paths - Dataframe with file, start_time, end_time of clips as index - Dataframe with (file, start_time, end_time) as columns - Dataframe with (file, start_time) as columns - Dataframe with file as index - Dataframe with (file) as column - CategoricalLabels object
preprocessor – preprocessor object, eg AudioPreprocessor or SpectrogramPreprocessor
overlap_fraction – see opensoundscape.utils.generate_clip_times_df
clip_overlap – see opensoundscape.utils.generate_clip_times_df
clip_step – see opensoundscape.utils.generate_clip_times_df
final_clip – see opensoundscape.utils.generate_clip_times_df
bypass_augmentations – if True, don’t apply any augmentations [default: True]
invalid_sample_behavior – how to handle samples that fail to preprocess, one of “substitute”, “placeholder”, “raise”, or “none” - “substitute”: pick another sample - “placeholder”: return a placeholder value (zeros) for the sample - “raise”: raise the error - “none”: return None
collate_fn –
function to collate list of AudioSample objects into batches if None, uses collate_fn=collate_audio_samples to return
a tuple of (data, labels) tensors
default is identity, which returns list of AudioSample objects (no collation)
audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files
**kwargs – any arguments to torch.utils.data.DataLoader
- Returns:
DataLoader that returns lists of AudioSample objects when iterated (if collate_fn is identity)
- preprocessor
do not override or modify this attribute, as it will have no effect
- samples
do not override or modify this attribute, as it will have no effect
- opensoundscape.ml.dataloaders.collate_audio_samples(samples)[source]
generate batched tensors of data and labels from list of AudioSample
assumes that s.data is a Tensor and s.labels is a list/array for each item in samples, and that every sample has labels for the same classes.
- Parameters:
samples – iterable of AudioSample objects (or other objects with attributes .data as Tensor and .labels as list/array)
- Returns:
(samples, labels) tensors of shape (batch_size, *) & (batch_size, n_classes)
- opensoundscape.ml.dataloaders.collate_audio_samples_to_dict(samples)[source]
generate batched tensors of data and labels (in a dictionary). returns collated samples: a dictionary with keys “samples” and “labels”
assumes that s.data is a Tensor and s.labels is a list/array for each sample S, and that every sample has labels for the same classes.
- Parameters:
samples – iterable of AudioSample objects (or other objects
list/array) (with attributes .data as Tensor and .labels as)
- Returns:
- dictionary of {
“samples”:batched tensor of samples, “labels”: batched tensor of labels,
}
opensoundscape.ml.datasets module
Preprocessors: pd.Series child with an action sequence & forward method
- class opensoundscape.ml.datasets.AudioFileDataset(*args: Any, **kwargs: Any)[source]
Bases:
DatasetBase class for audio datasets with OpenSoundscape (use in place of torch Dataset)
Custom Dataset classes should subclass this class or its children.
Datasets in OpenSoundscape contain a Preprocessor object which is responsible for the procedure of generating a sample for a given input. The DataLoader handles a dataframe of samples (and potentially labels) and uses a Preprocessor to generate samples from them.
- Parameters:
samples –
the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index of (path,start_time,end_time) per clip, OR - a list or np.ndarray of audio file paths
- Notes for input dataframe:
df must have audio paths in the index.
If label_df has labels, the class names should be the columns, and
- the values of each row should be 0 or 1.
If data does not have labels, label_df will have no columns
preprocessor – an object of BasePreprocessor or its children which defines the operations to perform on input samples
bypass_augmentations – if True, skips Actions with .is_augmentation=True
audio_root – optionally pass a root directory (pathlib.Path or str) to prepend to each file path - if None (default), samples must contain full paths to files
**kwargs – passed to make_clip_df via _ingest_samples_argument
- Returns:
sample (AudioSample object)
- Raises:
PreprocessingError if exception is raised during __getitem__ –
- Effects:
- self.invalid_samples will contain a set of paths that did not successfully
produce a list of clips with start/end times
- audio_root
path to prepend to all audio file paths when loading
- bypass_augmentations
if True, skips Actions with .is_augmentation=True
- classes
list of classes to which multi-hot labels correspond
- classmethod from_categorical_df(categorical_labels, preprocessor, class_list, bypass_augmentations=False)[source]
Create AudioFileDataset from a DataFrame with a column listing categorical labels
e.g. where df[‘labels’] = [[‘a’,’b’], [], [‘a’,’c’]]
- Parameters:
categorical_labels – DataFrame with index (file) or (file, start_time, end_time) and ‘label’ column containing lists of labels or integers corresponding to class names
preprocessor – Preprocessor object
bypass_augmentations – if True, skip augmentations with .is_augmentation=True
- Returns:
AudioFileDataset object
- head(n=5)[source]
out-of-place copy of first n samples
performs df.head(n) on self.label_df
- Parameters:
n – number of first samples to return, see pandas.DataFrame.head()
[default – 5]
- Returns:
a new dataset object
- invalid_samples
set of file paths that raised exceptions during preprocessing
- label_df
dataframe containing file paths, clip times, and multi-hot labels (one column per class)
- preprocessor
Preprocessor object containing a .pipeline of ordered preprocessing operations
- class opensoundscape.ml.datasets.EmbeddingDataset(*args: Any, **kwargs: Any)[source]
Bases:
Datasetsimple dataset wrapper for embedding features and labels
- Parameters:
features – tensor or np.array of input features first dimension should be samples
labels – tensor or np.array of target labels first dimension should be samples
opensoundscape.ml.lightning module
- class opensoundscape.ml.lightning.LightningSpectrogramModule(*args: Any, **kwargs: Any)[source]
Bases:
SpectrogramModule,LightningModule- fit_with_trainer(train_df, validation_df=None, epochs=1, batch_size=1, num_workers=0, save_path='.', invalid_samples_log='./invalid_training_samples.log', raise_errors=False, wandb_session=None, checkpoint_path=None, **kwargs)[source]
train the model on samples from train_dataset
If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().
- Parameters:
train_df – a dataframe of files and labels for training the model - either has index file or multi-index (file,start_time,end_time)
validation_df – a dataframe of files and labels for evaluating the model [default: None means no validation is performed]
batch_size – number of training files simultaneously passed through forward pass, loss function, and backpropagation
num_workers – number of parallel CPU tasks for preprocessing Note: use 0 for single (root) process (not 1)
save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]
save_interval – interval in epochs to save model object with weights [default:1] Note: the best model is always saved to best.model in addition to other saved epochs.
log_interval – interval in batches to print training loss/metrics
validation_interval – interval in epochs to test the model on the validation set Note that model will only update it’s best score and save best.model file on epochs that it performs validation.
invalid_samples_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file
raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log
wandb_session – a wandb session to log to (Note: can also pass logger kwarg with any Lightning logger object) - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb For example:
` import wandb wandb.login(key=api_key) #find your api_key at https://wandb.ai/settings session = wandb.init(enitity='mygroup',project='project1',name='first_run') ... model.fit_with_trainer(...,wandb_session=session) session.finish() `**kwargs – any arguments to pytorch_lightning.Trainer(), such as accelerator, precision, logger, accumulate_grad_batches, etc. Note: the max_epochs kwarg is overridden by the epochs argument
- Returns:
a trained pytorch_lightning.Trainer object
- Effects:
If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of training and validation samples are preprocessed and logged to a table. Training progress, loss, and metrics are also logged. Use self.wandb_logging dictionary to change the number of samples logged.
- forward(samples)[source]
standard Lightning method defining action to take on each batch for inference
typically returns logits (raw, untransformed model outputs)
- load_weights(path, strict=True)[source]
load network weights state dict from a file
For instance, load weights saved with .save_weights() in-place operation
- Parameters:
path – file path with saved weights
strict – (bool) see torch.Module.load_state_dict()
- predict_with_trainer(samples, batch_size=1, num_workers=0, activation_layer=None, clip_overlap=None, overlap_fraction=None, clip_step=None, final_clip='extend', bypass_augmentations=True, invalid_samples_log=None, raise_errors=False, return_invalid_samples=False, lightning_trainer_kwargs=None, dataloader_kwargs=None)[source]
Generate predictions on a set of samples
Return dataframe of model output scores for each sample. Optional activation layer for scores (softmax, sigmoid, softmax then logit, or None)
- Parameters:
samples – the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path (str or pathlib.Path)
batch_size – Number of files to load simultaneously [default: 1]
num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]
activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw scores (ie logit, [-inf:inf]) - ‘softmax’: scores all classes sum to 1 - ‘sigmoid’: all scores in [0,1] but don’t sum to 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]
overlap_fraction – see opensoundscape.utils.generate_clip_times_df
clip_overlap – see opensoundscape.utils.generate_clip_times_df
clip_step – see opensoundscape.utils.generate_clip_times_df
final_clip – see opensoundscape.utils.generate_clip_times_df
bypass_augmentations – If False, Actions with is_augmentation==True are performed. Default True.
invalid_samples_log – if not None, samples that failed to preprocess will be listed in this text file.
raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log
wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb
return_invalid_samples – bool, if True, returns second argument, a set containing file paths of samples that caused errors during preprocessing [default: False]
lightning_trainer_kwargs – dictionary of keyword args to pass to __call__, which are then passed to lightning.Trainer.__init__ see lightning.Trainer documentation for options. [Default: None] passes no kwargs
dataloader_kwargs – dictionary of keyword args to self.predict_dataloader()
- Returns:
df of post-activation_layer scores - if return_invalid_samples is True, returns (df,invalid_samples) where invalid_samples is a set of file paths that failed to preprocess
- Effects:
(1) wandb logging If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of samples is preprocessed and logged to a table. Progress over all batches is logged. After prediction, top scoring samples are logged. Use self.wandb_logging dictionary to change the number of samples logged or which classes have top-scoring samples logged.
(2) unsafe sample logging If unsafe_samples_log is not None, saves a list of all file paths that failed to preprocess in unsafe_samples_log as a text file
- Note: if loading an audio file raises a PreprocessingError, the scores
for that sample will be np.nan
- save(path, save_hooks=False, weights_only=False)[source]
save model with weights using Trainer.save_checkpoint()
load from saved file with LightningSpectrogramModule.load_from_checkpoint()
Note: saving and loading model objects across OpenSoundscape versions will not work properly. Instead, use .save_weights() and .load_weights() (but note that architecture, customizations to preprocessing, training params, etc will not be retained using those functions).
For maximum flexibilty in further use, save the model with both .save() and .save_torch_dict() or .save_weights().
- Parameters:
path – file path for saved model object
save_hooks – retain forward and backward hooks on modules [default: False] Note: True can cause issues when using wandb.watch()
- save_weights(path)[source]
save just the weights of the network
This allows the saved weights to be used more flexibly than model.save() which will pickle the entire object. The weights are saved in a pickled dictionary using torch.save(self.network.state_dict())
- Parameters:
path – location to save weights file
opensoundscape.ml.loss module
loss function classes to use with opensoundscape models
- class opensoundscape.ml.loss.BCELossWeakNegatives(*args: Any, **kwargs: Any)[source]
Bases:
BCEWithLogitsLossBCEWithLogitsLoss that applies a weak negative weight to nan labels in the target.
This is different from soft labeling: we treat nan labels as negatives, then apply element-wise weighting to reduce their contribution to the loss.
- Parameters:
weak_negative_weight – weight to apply to nan labels in target
**kwargs – passed to nn.BCEWithLogitsLoss
- class opensoundscape.ml.loss.BCEWithLogitsLoss_hot(*args: Any, **kwargs: Any)[source]
Bases:
BCEWithLogitsLossuse pytorch’s nn.BCEWithLogitsLoss for one-hot labels by simply converting y from long to float
- Parameters:
**kwargs – passed to nn.BCEWithLogitsLoss
- class opensoundscape.ml.loss.CrossEntropyLoss_hot(*args: Any, **kwargs: Any)[source]
Bases:
CrossEntropyLossuse pytorch’s nn.CrossEntropyLoss for one-hot labels by converting labels from 1-hot to integer labels
throws a ValueError if labels are not one-hot
- Parameters:
**kwargs – passed to nn.CrossEntropyLoss
- opensoundscape.ml.loss.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None)[source]
helper function for BCE loss in ResampleLoss class
- opensoundscape.ml.loss.reduce_loss(loss, reduction)[source]
Reduce loss as specified.
- Parameters:
loss (Tensor) – Elementwise loss tensor.
reduction (str) – Options are “none”, “mean” and “sum”.
- Returns:
Reduced loss tensor.
- Return type:
Tensor
- opensoundscape.ml.loss.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[source]
Apply element-wise weight and reduce loss.
- Parameters:
loss (Tensor) – Element-wise loss.
weight (Tensor) – Element-wise weights.
reduction (str) – Same as built-in losses of PyTorch.
avg_factor (float) – Avarage factor when computing the mean of losses.
- Returns:
Processed loss values.
- Return type:
Tensor
opensoundscape.ml.safe_dataset module
Dataset wrapper to handle errors gracefully in Preprocessor classes
A SafeDataset handles errors in a potentially misleading way: If an error is raised while trying to load a sample, the SafeDataset will instead load a different sample. The indices of any samples that failed to load will be stored in ._invalid_indices.
The behavior may be desireable for training a model, but could cause silent errors when predicting a model (replacing a bad file with a different file), and you should always be careful to check for ._invalid_indices after using a SafeDataset.
based on an implementation by @msamogh in nonechucks (github.com/msamogh/nonechucks/)
- class opensoundscape.ml.safe_dataset.SafeDataset(dataset, invalid_sample_behavior)[source]
Bases:
objectA wrapper for a Dataset that handles errors when loading samples
WARNING: When iterating, will skip the failed sample, but when using within a DataLoader, finds the next good sample and uses it for the current index (see __getitem__).
Note that this class does not subclass DataSet. Instead, it contains a .dataset attribute that is a DataSet (or AudioFileDataset / AudioFileDataset, which subclass DataSet).
- Parameters:
dataset – a torch Dataset instance or child such as AudioFileDataset, AudioFileDataset
eager_eval – If True, checks if every file is able to be loaded during initialization (logs _valid_indices and _invalid_indices)
Attributes: _valid_indices and _invalid_indices can be accessed later to check which samples raised Exceptions.
opensoundscape.ml.sampling module
classes for strategically sampling within a DataLoader
- class opensoundscape.ml.sampling.ClassAwareSampler(*args: Any, **kwargs: Any)[source]
Bases:
SamplerIn each batch of samples, pick a limited number of classes to include and give even representation to each class
- class opensoundscape.ml.sampling.ImbalancedDatasetSampler(*args: Any, **kwargs: Any)[source]
Bases:
SamplerSamples elements randomly from a given list of indices for imbalanced dataset :param indices: a list of indices :type indices: list, optional :param num_samples: number of samples to draw :type num_samples: int, optional :param callback_get_label func: a callback-like function which takes two arguments:
dataset and index
Based on Imbalanced Dataset Sampling by davinnovation (https://github.com/ufoym/imbalanced-dataset-sampler)
opensoundscape.ml.shallow_classifier module
- class opensoundscape.ml.shallow_classifier.MLPClassifier(*args: Any, **kwargs: Any)[source]
Bases:
Moduleinitialize a fully connected NN (MLP) with ReLU activations
- Parameters:
input_size – length of 1-d tensors passed as input samples
output_size – number of classes at the output layer
hidden_layer_sizes – default () empty tuple creates a 1-layer regression classifier, specify sequence of hidden layers by the number of elements. For example (100,) creates 1 hidden layer with 100 element
classes (optional) – list of class names, if provided should have len=output_size - default: None
weights – optionally pass a pytorch weight_dict of model weights to load default None initializes the model with random weights
- fit(train_features, train_labels, validation_features=None, validation_labels=None, batch_size=128, steps=1000, optimizer=None, criterion=None, device=torch.device, validation_interval=1, logging_interval=100, early_stopping_patience=None)[source]
train a PyTorch model on features and labels with batching and early stopping
Assumes all data can fit in memory. Training uses batched DataLoaders for efficient processing. If validation data is provided, the model with the lowest validation loss is automatically restored at the end of training (early stopping).
Defaults are for multi-target label problems and assume train_labels is an array of 0/1 of shape (n_samples, n_classes)
- Parameters:
model (); generally shape (n_samples,n_features) – a torch.nn.Module object to train
train_features – input features for training, often embeddings; should be a valid input to
model
train_labels – labels for training, generally one-hot encoded with shape
(n_samples
criterion() (n_classes); should be a valid target for)
validation_features – input features for validation; if None, does not perform validation
validation_labels – labels for validation; if None, does not perform validation
batch_size – batch size for training; if fewer samples than batch_size, the entire dataset is used as a single batch [Default: 128]
steps – number of training steps forward/backward passes on one batch [Default: 1000]
optimizer – torch.optim optimizer to use; default None uses AdamW
criterion – loss function to use; default None uses BCELossWeakNegatives() (appropriate for
negatives (multi-label classification); this loss function treats NaN labels as weak)
:param : :param using a default weight of 0.01 for NaN labels compared to strong labels: :param device: torch.device to use; default is torch.device(‘cpu’); can also be e.g. :param torch.device(‘cuda: 0’) for first CUDA GPU or torch.device(‘mps’) for Mac with M1/M2 :param validation_interval: how often to validate the model during training; if validation_features :param and validation_labels are provided: :param validation is performed every validation_interval steps: :param logging_interval: how often to print training progress; progress is logged every :param logging_interval steps when validation is performed: :param early_stopping_patience: if provided and validation data is available, training will stop :param early if validation loss doesn’t improve for this many steps: :type early if validation loss doesn’t improve for this many steps: not validation evaluations :param [Default: None, which means no early stopping]
- classmethod from_torch_linear(linear_layer, classes=None)[source]
initialize an MLPClassifier from a torch.nn.Linear layer
Initializes 1-layer MLP, copying weights from linear_layer
- Parameters:
linear_layer – a torch.nn.Linear layer whose weight and bias will be used to initialize the classifier layer
shape (of the MLPClassifier; should have)
classes (optional) – list of class names, if provided should have len=output_size default: None
- opensoundscape.ml.shallow_classifier.augmented_embed(embedding_model, sample_df, n_augmentation_variants, batch_size=1, num_workers=0, device=torch.device, audio_root=None)[source]
Embed samples using augmentation during preprocessing
- Parameters:
embedding_model – a model with an embed() method that takes a dataframe and returns
like (embeddings (e.g. a pretrained opensoundscape model or Bioacoustics Model Zoo model)
Perch
BirdNET
HawkEars)
sample_df – dataframe with samples to embed
n_augmentation_variants – number of augmented variants to generate for each sample
batch_size – batch size for embedding; default 1
num_workers – number of workers for embedding; default 0
device – torch.device to use; default is torch.device(‘cpu’)
- Returns:
the embedded training samples and their labels, as torch.tensors
- Return type:
x_train, y_train
- opensoundscape.ml.shallow_classifier.count_dets_hoplite(db, classifier, classes, min_score=None, max_score=None, score_bins=None, batch_size=1024, date_range=None, time_range=None, deployments=None, projects=None, recordings=None, deployments_filter=None, recordings_filter=None, windows_filter=None, annotations_filter=None, progress_bar=False)[source]
Count detections in score bins/ranges based on classifier predictions and filters
Compared to select_from_hoplite, this function does not return the selected clips but just counts the number of clips in each score bin/range for each class. This can be quick and memory efficient for counting detections in large datasets if you don’t need clip info.
- Parameters:
db – hoplite database containing embeddings classifier: MLPClassifier object or other
classes (classifier object to call on the torch.tensor embeddings) – list of class names to
None (score to filter clips by existing score in the database; if) – minimum
min_score (selects clips for every class in classifier) – minimum
None
min (does not threshold by)
max_score (score) – maximum score to filter clips by existing score in the database; if None,
score_bins (does not restrict by max score) – if provided, a list of tuples (low, high) score
in (ranges to count detections) –
if None, reports all scores above min_score and below max_score in a single bin
if provided, min_score and max_score are ignored and bins are determined by score_bins
batch_size – n samples simultaneously processed when applying classifier to embeddings; default 1024
date_range – tuple of (start_date, end_date) to filter clips by date; Formats: datetime.datetime, datetime.date, or string in “YYYY-MM-DD” format; if None, does not filter by date Can pass (date,None) or (None,date) to filter by only start or end date, respectively
time_range – tuple of (start_time, end_time) to filter clips by time of day; if None, does
day (not filter by time of) – Formats: datetime.datetime, datetime.time or string in “HH:MM:SS” format Note: filters by time of day of the _recording_ start time (rather than audio clip start time) Assumes time zone match between time_range values and recording timestamps in the database
deployments – list of deployment names to filter by; if None, does not filter by deployment
projects – list of project names to filter by; if None, does not filter by project
recordings – list of recording names to filter by; if None, does not filter by recording
deployments_filter – custom filter dict for deployments; if provided, overrides deployments argument
recordings_filter – custom filter dict for recordings; if provided, overrides recordings argument
windows_filter – custom filter dict for windows; if provided, overrides date_range,
arguments (time_range)
annotations_filter – custom filter dict for annotations in hoplite DB
- Returns:
dict of dicts with counts[class][bin_range] = count of clips for class in score bin; if score_bins is None, bin_range is (min_score, max_score) (if min_score and/or max_score are also None, uses -inf &/or +inf)
- Return type:
counts
- opensoundscape.ml.shallow_classifier.fit(model, train_features, train_labels, validation_features=None, validation_labels=None, batch_size=128, steps=1000, optimizer=None, criterion=None, device=torch.device, validation_interval=1, logging_interval=100, early_stopping_patience=None)[source]
train a PyTorch model on features and labels with batching and early stopping
Assumes all data can fit in memory. Training uses batched DataLoaders for efficient processing. If validation data is provided, the model with the lowest validation loss is automatically restored at the end of training (early stopping).
Defaults are for multi-target label problems and assume train_labels is an array of 0/1 of shape (n_samples, n_classes)
- Parameters:
model (); generally shape (n_samples,n_features) – a torch.nn.Module object to train
train_features – input features for training, often embeddings; should be a valid input to
model
train_labels – labels for training, generally one-hot encoded with shape
(n_samples
criterion() (n_classes); should be a valid target for)
validation_features – input features for validation; if None, does not perform validation
validation_labels – labels for validation; if None, does not perform validation
batch_size – batch size for training; if fewer samples than batch_size, the entire dataset is used as a single batch [Default: 128]
steps – number of training steps forward/backward passes on one batch [Default: 1000]
optimizer – torch.optim optimizer to use; default None uses AdamW
criterion – loss function to use; default None uses BCELossWeakNegatives() (appropriate for
negatives (multi-label classification); this loss function treats NaN labels as weak)
:param : :param using a default weight of 0.01 for NaN labels compared to strong labels: :param device: torch.device to use; default is torch.device(‘cpu’); can also be e.g. :param torch.device(‘cuda: 0’) for first CUDA GPU or torch.device(‘mps’) for Mac with M1/M2 :param validation_interval: how often to validate the model during training; if validation_features :param and validation_labels are provided: :param validation is performed every validation_interval steps: :param logging_interval: how often to print training progress; progress is logged every :param logging_interval steps when validation is performed: :param early_stopping_patience: if provided and validation data is available, training will stop :param early if validation loss doesn’t improve for this many steps: :type early if validation loss doesn’t improve for this many steps: not validation evaluations :param [Default: None, which means no early stopping]
- opensoundscape.ml.shallow_classifier.fit_classifier_on_embeddings(embedding_model, classifier_model, train_df, validation_df, n_augmentation_variants=0, embedding_batch_size=1, embedding_num_workers=0, steps=1000, optimizer=None, criterion=None, device=torch.device, early_stopping_patience=None, logging_interval=100, validation_interval=1, audio_root=None)[source]
Embed samples with an embedding model, then fit a classifier on the embeddings
wraps embedding_model.embed() with fit(clf,…)
Also supports generating augmented variations of the training samples
Note: if embedding takes a while and you might want to fit multiple times, consider embedding the samples first then running fit(…) rather than calling this function.
- Parameters:
embedding_model – a model with an embed() method that takes a dataframe and returns embeddings
Perch ((e.g. a pretrained opensoundscape model or Bioacoustics Model Zoo model like)
BirdNET
HawkEars)
classifier_model – a torch.nn.Module object to train, e.g. MLPClassifier or final layer of CNN
train_df – dataframe with training samples and labels; see opensoundscape.ml.cnn.train() train_df argument
validation_df – dataframe with validation samples and labels; see opensoundscape.ml.cnn.train() validation_df if None, skips validation
n_augmentation_variants – if 0 (default), embeds training samples without augmentation; if >0, embeds each training sample with stochastic augmentation num_augmentation_variants times
embedding_batch_size – batch size for embedding; default 1
embedding_num_workers – number of workers for embedding; default 0
steps – model fitting parameters, see fit()
optimizer – model fitting parameters, see fit()
criterion – model fitting parameters, see fit()
device – model fitting parameters, see fit()
early_stopping_patience – if provided, training will stop early if validation loss doesn’t improve for this many steps (not validation evaluations) [Default: None, which means no early stopping]
logging_interval – how often to print training progress; progress is logged every logging_interval steps when validation is performed
validation_interval – how often to validate the model during training; if validation_df is provided, validation is performed every validation_interval steps
audio_root – if provided, used as prefix for audio files in train_df and validation_df; if None, assumes train_df and validation_df already have absolute audio paths
- Returns:
the embedded training and validation samples and their labels, as torch.tensor, plus a dictionary of validation metrics for the best model found during training
- Return type:
x_train, y_train, x_val, y_val, metrics
- opensoundscape.ml.shallow_classifier.fit_on_hoplite(classifier, hoplite_db, train_df, validation_df=None, batch_size=128, steps=10000, optimizer=None, criterion=None, device=torch.device, validation_interval=100, logging_interval=100, early_stopping_patience=None, progress_bar=False, **kwargs)[source]
train a PyTorch classifier on Hoplite Embedding DB and label dataframe
Defaults are for multi-target label problems and assume train_df is a dataframe of 0/1 per class with multi-index (file, start_time, end_time)
- Parameters:
classifier – a torch.nn.Module object to train
hoplite_db – a HopliteDB instance containing the embeddings to train on
train_df – labels for training, generally one-hot encoded with shape
(n_samples
criterion() (n_classes); should be a valid target for)
validation_df – labels for validation; if None, does not perform validation
validation_labels – labels for validation; if None, does not perform validation
batch_size – batch size for training; if fewer samples than batch_size, the entire dataset is used as a single batch [Default: 128]
steps – number of training steps (epochs; each step, all data is passed forward and backward, and the optimizer updates the weights [Default: 10_000]
optimizer – torch.optim optimizer to use; default None uses AdamW
criterion – loss function to use; default None uses BCELossWeakNegatives() (appropriate for
negatives (multi-label classification); this loss function treats NaN labels as weak) – using a default weight of 0.01 for NaN labels compared to strong labels
:param : using a default weight of 0.01 for NaN labels compared to strong labels :param device: torch.device to use; default is torch.device(‘cpu’) :param validation_interval: how often to validate the model during training; if validation_features :param and validation_labels are provided: :param validation is performed every validation_interval steps: :param logging_interval: how often to print training progress; progress is logged every :param logging_interval steps when validation is performed: :param early_stopping_patience: if provided and validation data is available, training will stop :param early if validation loss doesn’t improve for this many steps: :type early if validation loss doesn’t improve for this many steps: not validation evaluations :param [Default: None, which means no early stopping] :param progress_bar: whether to show a progress bar during training; default False :param **kwargs: additional keyword arguments passed to HopliteDataset; see HopliteDataset.__init__()
- opensoundscape.ml.shallow_classifier.predict_on_hoplite(db, samples, classifier, clip_duration=None, batch_size=1024, return_df=True, device=torch.device, **kwargs)[source]
Apply model to embeddings from database for each clip in samples
- Parameters:
db – hoplite database containing embeddings
samples – a dataframe of clips or list of audio files dataframe with columns “file”, “start_time”, “end_time” specifying clips to apply the model to
classifier – MLPClassifier object or other classifier object to call on the torch.tensor embeddings
clip_duration – provide clip length (s) if passing files rather than pre-defined file/start_time/end_time clips
batch_size – n samples simultaneously processed when applying classifier to embeddings; default 1024
return_df – if True, returns a dataframe with the same index as samples and columns for each class; if False, returns a numpy array of predictions uses classifier.classes if available for df column names, otherwise uses integer column names
**kwargs – additional keyword arguments to pass to HopliteDataset
- Returns:
predictions for each clip
- Return type:
pandas.DataFrame or numpy.ndarray
See also
select_from_hoplite if samples are already embedded and you wish to select filtered (random/top-scoring/all) clips
- opensoundscape.ml.shallow_classifier.select_from_hoplite(db, classifier, classes, k=5, strategy: Literal['top_k', 'random_k', 'all'] = 'top_k', batch_size=1024, date_range=None, time_range=None, min_score=None, max_score=None, deployments=None, projects=None, recordings=None, deployments_filter=None, recordings_filter=None, windows_filter=None, annotations_filter=None, random_state=None, return_windows=False, progress_bar=False, warn_no_matches=False)[source]
Extract top-scoring or random clips from the database based on classifier predictions and filters
- Parameters:
db – hoplite database containing embeddings
classifier – MLPClassifier object or other classifier object to call on the torch.tensor embeddings
classes – list of class names to select clips for; if None, selects clips for every class in classifier
k – number of clips to return per class; default 5 (ignored if strategy=”all”)
strategy – which clips to select: “top_k” to return the top k clips for each class “random_k” to return k random clips “all” to return all clips (ignores k) default “top_k”
batch_size – n samples simultaneously processed when applying classifier to embeddings; default 1024
date_range – tuple of (start_date, end_date) to filter clips by date; Formats: datetime.datetime, datetime.date, or string in “YYYY-MM-DD” format; if None, does not filter by date Can pass (date,None) or (None,date) to filter by only start or end date, respectively
time_range – tuple of (start_time, end_time) to filter clips by time of day; if None, does not filter by time of day Formats: datetime.datetime, datetime.time or string in “HH:MM:SS” format Note: filters by time of day of the _recording_ start time (rather than audio clip start time) Assumes time zone match between time_range values and recording timestamps in the database
min_score – minimum score to filter clips by existing score in the database; if None, does not threshold by min score
max_score – maximum score to filter clips by existing score in the database; if None, does not restrict by max score
deployments – list of deployment names to filter by; if None, does not filter by deployment
projects – list of project names to filter by; if None, does not filter by project
recordings – list of recording names to filter by; if None, does not filter by recording
deployments_filter – custom filter dict for deployments; if provided, overrides deployments argument
recordings_filter – custom filter dict for recordings; if provided, overrides recordings argument
windows_filter – custom filter dict for windows; if provided, overrides date_range, time_range arguments
annotations_filter – custom filter dict for annotations in hoplite DB
warn_no_matches – if True, raises a warning if no clips are found for a class after applying filters and score thresholds; default False
- Returns:
list of matching windows} if return_windows=True; otherwise a dataframe with columns for class, score, and window info
- Return type:
dict of {class_name
opensoundscape.ml.utils module
Utilties for .ml
- opensoundscape.ml.utils.apply_activation_layer(x, activation_layer=None)[source]
applies an activation layer to a set of scores
- Parameters:
x – input values
activation_layer –
None [default]: return original values
’softmax’: apply softmax activation
’sigmoid’: apply sigmoid activation
’softmax_and_logit’: apply softmax then logit transform
- Returns:
values with activation layer applied Note: if x is None, returns None
Note: casts x to float before applying softmax, since torch’s softmax implementation doesn’t support int or Long type
- opensoundscape.ml.utils.cas_dataloader(dataset, batch_size, num_workers)[source]
Return a dataloader that uses the class aware sampler
Class aware sampler tries to balance the examples per class in each batch. It selects just a few classes to be present in each batch, then samples those classes for even representation in the batch.
- Parameters:
dataset – a pytorch dataset type object
batch_size – see DataLoader
num_workers – see DataLoader
- opensoundscape.ml.utils.check_labels(label_df, classes)[source]
check that classes and label_df.columns are the same, otherwise raise a helpful error
- opensoundscape.ml.utils.collate_audio_samples_to_tensors(batch)[source]
takes a list of AudioSample objects, returns batched tensors
use this collate function with DataLoader if you want to use AudioFileDataset (or AudioFileDataset) but want the traditional output of PyTorch Dataloaders (returns two tensors:
the first is a tensor of the data with dim 0 as batch dimension, the second is a tensor of the labels with dim 0 as batch dimension)
- Parameters:
batch – a list of AudioSample objects
- Returns:
(Tensor of stacked AudioSample.data, Tensor of stacked AudioSample.label.values)
from opensoundscape import AudioFileDataset, SpectrogramPreprocessor
preprocessor = SpectrogramPreprocessor(sample_duration=2,height=224,width=224) audio_dataset = AudioFileDataset(label_df,preprocessor)
- train_dataloader = DataLoader(
audio_dataset, batch_size=64, shuffle=True, collate_fn = collate_audio_samples_to_tensors
)
- opensoundscape.ml.utils.get_batch(array, batch_size, batch_number)[source]
get a single slice of a larger array
using the batch size and batch index, from zero
- Parameters:
array – iterable to split into batches
batch_size – num elements per batch
batch_number – index of batch
- Returns:
one batch (subset of array)
Note: the final elements are returned as the last batch even if there are fewer than batch_size
Example
if array=[1,2,3,4,5,6,7] then:
get_batch(array,3,0) returns [1,2,3]
get_batch(array,3,3) returns [7]
opensoundscape.ml.export module
- class opensoundscape.ml.export.SequentialModelExporter(*args: Any, **kwargs: Any)[source]
Bases:
Module
- opensoundscape.ml.export.to_onnx_program(preprocessing_transforms, torch_model, input_length, activation_layer=None, include_preprocessor_output=True, include_embedding_output=True, include_classifier_output=True, opset_version=18, **kwargs)[source]
Export a torch model with preprocessing transforms to ONNX format
Attempts to separate embedding and classifier outputs from torch_model, if torch_model has attribute ‘classifier_layer’ indicating the name of the layer that should be considered the “classifier”. The remaining layers are considered the “embedding” portion of the network. There should be no layers after the classifier layer.
Optionally adds a sigmoid or softmax activation layer on the classifier outputs.
Requires that onnx, onnxruntime, and onnxscript are packages are installed
- Parameters:
preprocessing_transforms – torch.nn.Module, preprocessing transforms to apply to raw audio
torch_model – torch.nn.Module, model to export
input_length – int, length of input audio samples in number of samples
activation_layer – str or None, activation layer to apply to classifier outputs options: None, ‘softmax’, ‘sigmoid’
include_preprocessor_output – bool, whether to include preprocessor output in ONNX model outputs
include_embedding_output – bool, whether to include embedding output in ONNX model outputs
include_classifier_output – bool, whether to include classifier output in ONNX model outputs
opset_version – int, ONNX opset version to use for export currently defaults to 18 because of issues with dynamic shapes in 20 with pytorch 2.9.0; should upgrade to 20 when stable fixes are released
**kwargs – additional keyword arguments to pass to torch.onnx.export
- Returns:
ONNX program model object
- Return type:
onnx_model
Example: ```python from opensoundscape import Audio, Spectrogram, CNN, BoxedAnnotations, preprocessors
- model = CNN(
architecture=”efficientnet_b0”, classes=[0, 1, 2, 3], sample_duration=3, preprocessor_cls=preprocessors.TorchSpectrogramPreprocessor, sample_rate=32000,
) # a list of torchaudio preprocesesing transforms such as Spectrogram, MelSpectrogram, etc. transforms=model.preprocessor[“transform”].transforms
# expected number of samples in input audio: 3*32000 input_length = model.preprocessor.sample_rate * model.preprocessor.sample_duration
- onnx_program = to_onnx_program(
preprocessing_transforms=transforms, torch_model=model.network, input_length=input_length, include_preprocessor_output=True,
opensoundscape.ml.song_space module
- class opensoundscape.ml.song_space.SongSpace(path, feature_extractor='perch2', sample_duration=None)[source]
Bases:
objectSongSpace is a framework for training and applying classifiers, combining a feature extractor and database
A SongSpace couples a feature extractor (e.g., BirdNET or Perch) with a database that stores embeddings of audio clips We can add one or more shallow classifiers, and labeled training and evaluation datasets
It provides utilities for: - ingesting audio datasets by saving their deep learning embeddings in a database - creating and evaluating (shallow) classifiers - applying a classifier to embeddings in a hoplite database with filtering by metadata and scores - selecting top-scoring or random clips from the database based on classifier predictions and filters - embedding-based similarity search
The main purpose of this class is to enable users to easily complete an active learning loop: - start with a few labeled samples and a bunch of unlabeled audio - embed everything - use similarity search, shallow classifiers, or targeted/random search to find clips - review clips and label more data - apply the final classifier to select clips for manual verification - end with manually verified detections for downstream analysis - potentially repeat with other species/classes
- Parameters:
path (str) – The path to the SongSpace directory
feature_extractor (str or model) – The feature extractor to use for embedding audio clips. Can be a string key for a model in the bioacoustics model zoo (“bs-convnext”, “birdnet”, “perch”, “perch2”) or a custom model object with an embed() method and a classifier attribute with an in_features property indicating the embedding dimension.
sample_duration (float) – duration of audio clips to embed and classify, in seconds; if None, uses the default sample duration of the feature extractor
- property database
The database object used to store embeddings for this SongSpace
property to protect from accidental modification
- property db
alias for self.database
- evaluate(classifier_name, dataset_name, batch_size=1024)[source]
Evaluate a classifier on a specified dataset and return metrics
- fit_classifier(classes, train_datasets, validation_dataset, weak_negatives_proportion=2, batch_size=128, steps=1000, optimizer=None, criterion=None, device='cpu', early_stopping_patience=None, logging_interval=100, validation_interval=1, classifier_hidden_layers=(), weak_negatives_weight=0.01)[source]
Fit a classifier on embeddings from the database for a given dataset
Note: Before fitting a classifier, ingest and create audio datasets with ingest_audio()
- Parameters:
classes – list of class names to train the classifier for; if None, trains for every class in the dataset(s)
train_datasets – list of dataset names to use for training; must have been added with ingest_audio()
validation_dataset – dataset name to use for validation if None, skips validation
weak_negatives_proportion – ratio of weak negatives to positives to add to the training data selects random unlabeled samples from the database and treats as no-species samples, but with a small weight in the loss function default 2 means adding 2 weak negatives for every labeled sample; if 0, does not add any weak negatives ignored if criterion is passed
embedding_batch_size – batch size for embedding; default 1
embedding_num_workers – number of workers for embedding; default 0
batch_size – model fitting parameters, see fit()
steps – model fitting parameters, see fit()
optimizer – model fitting parameters, see fit()
criterion – model fitting parameters, see fit()
device – model fitting parameters, see fit()
early_stopping_patience – if provided, training will stop early if validation loss doesn’t improve for this many steps (not validation evaluations) [Default: None, which means no early stopping]
logging_interval – how often to print training progress; progress is logged every logging_interval steps when validation is performed
validation_interval – how often to validate the model during training; if validation_dataset_name is provided, validation is performed every validation_interval steps
audio_root – if provided, used as prefix for audio files in train_df and validation_df; if None, assumes train_df and validation_df already have absolute audio paths
classifier_hidden_layers – tuple of hidden layer sizes for the MLPClassifier; default is () for no hidden layers (i.e. linear probe / logistic regression)
weak_negatives_weight – weight for the weak negative samples in the loss function default 0.01; ignored if criterion is passed
Returns: new classifier
- get_dataset_embeddings(dataset_name)[source]
Utility to get the embeddings and labels for a given dataset as numpy arrays
- ingest_audio(samples, dataset_name, file_to_deployment=<function parent_folder_name>, allow_training=True, audio_root=None, embedding_exists_mode='skip', file_to_datetime=aru_metadata_parser.parse.ARUFileTimestampParser.parse, **kwargs)[source]
Embed samples using the feature extractor and store in a new or existing dataset
- Parameters:
samples – dataframe with columns “file”, “start_time”, “end_time” specifying clips to embed
dataset_name –
name of the dataset to store the embeddings in - if existing, combines with existing dataset of the same name, taking the new
labels in the case of conflicts
if not existing, creates a new dataset with the given name, using allow_training and audio_root to set up the dataset parameters
Also uses dataset_name as the ‘project’ name for the deployment in the database
file_to_deployment – str, function, or dictionary mapping filenames to deployment names - if function, should take a single argument (filename: str) and return a deployment name (str) - if dictionary or pd.Series, should map filenames (str) to deployment names (str) - if str, the name of the deployment that all samples will be associated with - if deployment does not exist in db, it will be created Utility functions for common patterns are provided in opensoundscape.utils, including parent_folder_name, two_parents_name, second_parent_name, filename_first_part (an LLM would also be great at writing a custom function given your deployment:audio file structure)
allow_training – if True, allows using this dataset for training classifiers; if False, dataset can still be used for validation but not training; default True
audio_root – if provided, used as prefix for audio files in samples; if None, assumes samples already have absolute audio paths if full paths provided and audio_root provided, converts to relative paths by stripping audio_root from the start of the paths in samples before embedding and storing in the database (see also: update_dataset_audio_root() to update audio_root if you move the entire audio dataset)
embedding_exists_mode – ‘skip’, ‘error’, or ‘add’ [default: ‘skip’] how to handle cases where an embedding already exists in the database # TODO impement ‘replace’ skip: skip embedding and keep existing embedding error: raise an error if an embedding already exists for a clip in samples add: add a new embedding alongside the existing one (e.g. for augmentated variations of same clip)
file_to_datetime – optional function or dictionary mapping filenames to datetime objects - used to set recording start times in the database Default: uses a flexible parser from aru_metadata_parser.parse handling most formats
**kwargs – additional keyword arguments to pass to the feature extractor’s embed() method
- metrics(predictions, labels, classes)[source]
Compute evaluation metrics for a set of predictions and true labels
- classmethod open(path, feature_extractor=None)[source]
Open an existing SongSpace from a specified path
if the feature_extractor is not one of the registered bioacoustics model zoo options (“bs-convnext”, “birdnet”, “perch”, “perch2”), create the feature extractor used previously, then pass it to this method.
- predict_on_dataset(classifier_name, dataset_name, batch_size=1024, return_df=True)[source]
Apply a classifier to a dataset and return predictions as a dataframe with the same index as the dataset’s label_df and columns for each class
- save()[source]
Save the SongSpace metadata to the SongSpace path, so that it can be re-loaded later with SongSpace.open()
- select(classifier, classes, k=5, strategy: Literal['top_k', 'random_k', 'all'] = 'top_k', batch_size=1024, date_range=None, time_range=None, min_score=None, max_score=None, deployments=None, projects=None, recordings=None, deployments_filter=None, recordings_filter=None, windows_filter=None, annotations_filter=None, random_state=None, return_windows=False, progress_bar=False, warn_no_matches=False)[source]
Extract top-scoring or random clips from the database based on classifier predictions and filters
- Parameters:
db – hoplite database containing embeddings
classifier – classifier to apply to embeddings in the database to generate clip ranking scores MLPClassifier object or other classifier object to call on the torch.tensor embeddings, or the name (str) in self.classifiers dictionary must have a ‘classes’ attribute listing the class names, including the classes specified in the classes argument
classes – list of class names to select clips for; if None, selects clips for every class in classifier
k – number of clips to return per class; default 5 (ignored if strategy=”all”)
strategy – which clips to select: “top_k” to return the top k clips for each class “random_k” to return k random clips “all” to return all clips (ignores k) default “top_k”
batch_size – n samples simultaneously processed when applying classifier to embeddings; default 1024
date_range – tuple of (start_date, end_date) to filter clips by date; Formats: datetime.datetime, datetime.date, or string in “YYYY-MM-DD” format; if None, does not filter by date Can pass (date,None) or (None,date) to filter by only start or end date, respectively
time_range – tuple of (start_time, end_time) to filter clips by time of day; if None, does not filter by time of day Formats: datetime.datetime, datetime.time or string in “HH:MM:SS” format Note: filters by time of day of the _recording_ start time (rather than audio clip start time) Assumes time zone match between time_range values and recording timestamps in the database
min_score – minimum score to filter clips by existing score in the database; if None, does not threshold by min score
max_score – maximum score to filter clips by existing score in the database; if None, does not restrict by max score
deployments – list of deployment names to filter by; if None, does not filter by deployment
projects – list of project names to filter by; if None, does not filter by project
recordings – list of recording names to filter by; if None, does not filter by recording
deployments_filter – custom filter dict for deployments; if provided, overrides deployments argument
recordings_filter – custom filter dict for recordings; if provided, overrides recordings argument
windows_filter – custom filter dict for windows; if provided, overrides date_range, time_range arguments
annotations_filter – custom filter dict for annotations in hoplite DB
warn_no_matches – if True, raises a warning if no clips are found for a class after applying filters and score thresholds; default False
- Returns:
list of matching windows} if return_windows=True; otherwise a dataframe with columns for class, score, and window info
- Return type:
dict of {class_name
- similarity_search(query_samples, k=5, exact_search=False, search_subset_size=None, target_score=None, audio_root=None, search_kwargs=None, **embedding_kwargs)[source]
Find the k most similar embeddings in the database to each query audio sample
- Parameters:
query_samples – audio file path, list of files, or dataframe with columns “file”, “start_time”, “end_time” specifying clips to embed and search for
k – number of similar samples to return; default 5
exact_search – default (False) uses an approximate nearest neighbor search for speed; if True, uses exact search for maximum recall but slower speed
search_subset_size – if provided, limits the search to a random subset of all samples
target_score – if provided, returns samples close to the target similarity score rather than _most_ similar samples - useful for finding samples that are similar but not too similar to the query samples
audio_root – if provided, used as prefix for audio files in query_samples; if None, assumes query_samples already have absolute audio paths
search_kwargs – dict of additional keyword arguments passed to db.ui.search() or brutalism.threaded_brute_search() if exact_search=True exact_search=False: radius, threads, exact, log, progress exact_search=True: batch_size, max_workers, rng_seed
**embedding_kwargs – additional keyword arguments passed to self.embed(), such as batch_size and num_workers
- Returns:
A dataframe with the same columns as the database metadata and an additional ‘similarity’ column, sorted by similarity to the query embedding
- stratified_selection(classifier, classes, stratify_deployments=True, stratify_day=False, date_ranges=None, stratify_recordings=False, stratify_datasets=False, k=5, strategy: Literal['top_k', 'random_k', 'all'] = 'top_k', batch_size=1024, date_range=None, time_range=None, min_score=None, max_score=None, deployments=None, projects=None, recordings=None, deployments_filter=None, recordings_filter=None, windows_filter=None, annotations_filter=None, random_state=None, progress_bar=False, warn_no_matches=False)[source]
Perform stratified selection of clips based on classifier predictions and filters
- Parameters:
classifier – classifier to apply to embeddings in the database to generate clip ranking scores; see select() for details
classes – list of class names to select clips for; if None, selects clips for every class in classifier
stratify_deployments – whether to stratify selection by deployment; default True
stratify_day – whether to stratify selection by day; default False
date_ranges – optional list of inclusive (start_date, end_date) ‘YYYY-MM-DD’ tuples for stratification - if provided, stratify_day is ignored and stratification is based on these date ranges instead
stratify_recordings – whether to stratify selection by recording (audio file); default False
stratify_datasets – whether to stratify selection by dataset; default False
k – number of clips to return per class per stratum; default 5 (ignored if strategy=”all”)
strategy – which clips to select: “top_k” to return the top k clips for each class in each stratum “random_k” to return k random clips for each class in each stratum “all” to return all clips (ignores k) default “top_k”
batch_size – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments
date_range – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments
time_range – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments
min_score – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments
max_score – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments
deployments – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments
projects – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments
- :paramrecordings, deployments_filter, recordings_filter, windows_filter, annotations_filter,
random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments
- Returns:
{class_name: list of matching windows}} if return_windows=True; otherwise a dataframe with columns for stratum_value, class, score, and window info
- Return type:
dict of {stratum_value
- update_dataset_audio_root(name, new_audio_root)[source]
Update the audio_root for a given dataset, which is used as the prefix for audio file paths when embedding new samples and searching for existing embeddings in the database
This is useful if you need to move your audio files after ingesting a dataset, or if you originally ingested with incorrect audio paths.
Note that this does not change the file paths in the label_df, but rather updates the audio_root that is prefixed to those file paths when embedding new samples or searching for existing embeddings in the database.