opensoundscape.ml package

Submodules

opensoundscape.ml.cam module

Class activation maps (CAM) for OpenSoundscape models

class opensoundscape.ml.cam.CAM(base_image, activation_maps=None, gbp_maps=None)[source]

Bases: object

Object to hold and view Class Activation Maps, including guided backprop

Stores activation maps as .activation_maps, and guided backprop as .gbp_cams

each is a Series indexed by class

create_rgb_heatmaps(class_subset=None, mode='activation', show_base=True, alpha=0.5, color_cycle=('#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'), gbp_normalization_q=99)[source]

create rgb numpy array of heatmaps overlaid on the sample

Can choose a subset of classes and activation/backprop modes

Parameters:
  • class_subset – iterable of classes to visualize with activation maps - default None plots all classes - each item must be in the index of self.gbp_map / self.activation_maps - note that a class None is created by cnn.generate_cams() when classes are not specified during CNN.generate_cams()

  • mode – str selecting which maps to visualize, one of: ‘activation’ [default]: overlay activation map ‘backprop’: overlay guided back propogation result ‘backprop_and_activation’: overlay product of both maps None: do not overlay anything on the original sample

  • show_base – if False, does not plot the image of the original sample [default: True]

  • alpha – opacity of the activation map overlap [default: 0.5]

  • color_cycle – iterable of colors activation maps - cycles through the list using one color per class

  • gbp_normalization_q – guided backprop is normalized such that the q’th percentile of the map is 1. [default: 99]. This helps avoid gbp maps that are too dark to see. Lower values make brighter and noiser maps, higher values make darker and smoother maps.

Returns:

numpy array of shape [w, h, 3] representing the image with CAM heatmaps if mode is None, returns the original sample if show_base is False, returns just the heatmaps if mode is None _and_ show_base is False, returns None

plot(class_subset=None, mode='activation', show_base=True, alpha=0.5, color_cycle=('#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'), figsize=None, plt_show=True, save_path=None, gbp_normalization_q=99, flipud=False)[source]

Plot per-class activation maps, guided back propogations, or their products

Do not pass both mode=None and show_base=False.

Parameters:
  • class_subset – iterable of classes to visualize with activation maps - default None plots all classes - each item must be in the index of self.gbp_map / self.activation_maps - note that a class None is created by cnn.generate_cams() when classes are not specified during CNN.generate_cams()

  • mode – str selecting which maps to visualize, one of: ‘activation’ [default]: overlay activation map ‘backprop’: overlay guided back propogation result ‘backprop_and_activation’: overlay product of both maps None: do not overlay anything on the original sample

  • show_base – if False, does not plot the image of the original sample [default: True]

  • alpha – opacity of the activation map overlap [default: 0.5]

  • color_cycle – iterable of colors activation maps - cycles through the list using one color per class

  • gbp_normalization_q – guided backprop is normalized such that the q’th percentile of the map is 1. [default: 99]. This helps avoid gbp maps that are too dark to see. Lower values make brighter and noiser maps, higher values make darker and smoother maps.

  • figsize – the figure size for the plot [default: None]

  • plt_show – if True, runs plt.show() [default: True] - ignored if return_numpy=True

  • save_path – path to save image to [default: None does not save file]

  • flipud – if True, flips the image vertically before plotting [default: False]

Returns:

(fig, ax) of matplotlib figure, or np.array if return_numpy=True

Note: if base_image does not have 3 channels, channels are averaged then copied across 3 RGB channels to create a greyscale image

Note 2: If return_numpy is true, fig and ax are never created, it simply creates

a numpy array representing the image with the CAMs overlaid and returns it

opensoundscape.ml.cam.normalize_q(x, q=99)[source]

Normalize x such that q-th percentile value is 1.0

opensoundscape.ml.cnn module

classes for pytorch machine learning models in opensoundscape

For tutorials, see notebooks on opensoundscape.org

opensoundscape.ml.cnn.CNN[source]

alias of SpectrogramClassifier

exception opensoundscape.ml.cnn.ChannelDimCheckError[source]

Bases: Exception

class opensoundscape.ml.cnn.SpectrogramClassifier(architecture, classes, sample_duration, sample_rate, single_target=False, preprocessor_dict=None, preprocessor_cls=<class 'opensoundscape.preprocess.preprocessors.SpectrogramPreprocessor'>, device=None, **preprocessor_kwargs)[source]

Bases: SpectrogramModule

defines pure pytorch train, predict, and eval methods for a spectrogram classifier

batch_forward(batch_samples, targets=None, avgpool=True)[source]

Forward pass for a batch of data

Parameters:
  • batch_samples – a batch of samples from a dataloader

  • targets – list of layers from self.network to extract outputs from The key self.class_outputs_key (-1 by default) corresponds to final model output. If None, only returns final model output.

  • avgpool – bool, if True, applies global average pooling to intermediate outputs (average across all dimensions except first to get

Returns:

dictionary with key for each output request in targets Key matching self.class_outputs_key corresponds to final model output.

current_step

track number of complete training steps

property device
early_stopping_config

Early stopping configuration dictionary.

Early stopping halts training if the validation score does not improve for a specified number of steps (patience).

The metric monitored for improvement is defined by self.score_metric, but adjust “mode” according to whether the score should be minimized (loss) or maximized (accuracy, f1, auroc, avg precision, etc).

To enable early stopping, set self.early_stopping_config[‘enabled’]=True and modify other parameters as desired.

‘patience’: number of steps with no improvement before stopping ‘min_delta’: minimum change in the monitored quantity to qualify as an improvement ‘mode’: ‘max’ or ‘min’, whether to look for maximum (eg accuracy) or minimum (eg loss) of the monitored quantity

embed(samples, batch_size=1, num_workers=0, target_layer=None, progress_bar=True, return_preds=False, avgpool=True, return_dfs=True, audio_root=None, output_size_warning=1000000000.0, **dataloader_kwargs)[source]

Generate embeddings (intermediate layer outputs) for audio files/clips

Note: to capture embeddings on multiple layers, use self.__call__ with intermediate_layers argument directly. This wrapper only allows one target_layer.

Note: Output can be n-dimensional array (return_dfs=False) or pd.DataFrame with multi-index like .predict() (return_dfs=True). If avgpool=False, return_dfs is forced to False since we can’t create a DataFrame with >2 dimensions.

For advanced use cases (e.g. multiple target layers), use self.__call__() directly.

Parameters:
  • samples – same as CNN.predict(): file path, list of file paths, OR pd.DataFrame with index containing audio file paths, OR a pd.DataFrame with multi-index (file, start_time, end_time)

  • batch_size – batch size to use for dataloader [default: 1]

  • num_workers – number of parallel CPU workers to use for dataloader [default: 0]

  • target_layer – layer from self.model._modules to extract outputs from - if None, attempts to use self.model.embedding_layer as default

  • progress_bar – bool, if True, shows a progress bar with tqdm [default: True]

  • return_preds – bool, if True, returns two outputs (embeddings, logits)

  • avgpool – bool, if True, applies global average pooling to intermediate outputs i.e. averages across all dimensions except first to get a 1D vector per sample

  • return_dfs – bool, if True, returns embeddings as pd.DataFrame with multi-index like .predict(). if False, returns np.array of embeddings [default: True]. If avg_pool=False, overrides to return np.array since we can’t have a df with >2 dimensions

  • audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files

  • self.predict_dataloader() (dataloader_kwargs are passed to)

Returns: (embeddings, preds) if return_preds=True or embeddings if return_preds=False

types are pd.DataFrame if return_dfs=True, or np.array if return_dfs=False

embed_to_hoplite_db(samples, db, deployment, project=None, file_to_datetime=None, target_layer=None, wandb_session=None, progress_bar=True, audio_root=None, embedding_exists_mode='skip', commit_frequency_batches=100, overflow_mode='warn', embedding_dim=None, strict_matching=False, **dataloader_kwargs)[source]

Run inference on a dataloader, saving 1D outputs of target_layer to a hoplite database

Note that all samples are associated with a single deployment (e.g. one audio recorder on one season) Call this method separately for each deployment to associate samples with different deployments in the database

Parameters:
  • samples – (same as CNN.predict())

  • db – a hoplite database object or a path to a hoplite database folder - if a path is provided, the database will be created if it does not exist - when creating a new db, the embedding_dim argument must be provided

  • deployment

    name of deployment (ie one recorder deployed once) to associate embeddings with - if deployment does not exist in db, it will be created - if you wish to include metadata per deployment (eg lat, lon, point name), first

    add the deployment to the db using perch_hoplite.db.interface.HopliteDB.insert_deployment()

  • project – optional project name to associate deployment with

  • file_to_datetime

    optional function or dictionary mapping filenames to datetime objects - used to set recording start times in the database - if None, recording start times will not be set - if a function is provided, it should take a single argument (filename: str)

    and return a datetime.datetime object

    • if a dictionary is provided, it should map filenames (str) to

      datetime.datetime objects

  • target_layer

    layer to extract embeddings from if None [default], attempts to use architecture’s default target_layer Note: only architectures created with opensoundscape 0.9.0+ will have a default target layer. See pytorch_grad_cam docs for suggestions. Note: if multiple layers are provided, the activations are merged across

    layers (rather than returning separate activations per layer)

  • wandb_session – a wandb session to log progress to (e.g. return value of wandb.init())

  • progress_bar – bool, if True, shows a progress bar with tqdm [default: True]

  • audio_root – the root directory for relative paths to audio files

  • embedding_exists_mode

    str, behavior when an embedding already exists for a given embedding. Options are: #TODO implement replace

    ”skip”: skip inserting the embedding (default) “error”: raise an error “add”: add a new embedding entry to the db with the same source info

    Note: the strict_matching argument affects whether existing embeddings are only

    matched within a deployment/project or across all deployments/projects.

    Note that hoplite doesn’t currently support removing or replacing existing entries

  • commit_frequency_batches – int, commit to db after every N batches[default: 1]

  • overflow_mode – ‘warn’, ‘error’, or ‘ignore’ behvior when embedding values exceed the range of float16, which is the range of values allowed in hoplite db

  • embedding_dim – int, dimension of the embeddings to be stored - only used when creating a new hoplite db - must match the output dimension of the model’s target_layer - if creating new db and embedding_dim is None, guesses based on self.classifier.in_features

  • strict_matching

    bool, select strategy for matching existing deployments and embeddings - if True, deployments are only considered matching if both deployment name and project match;

    embeddings are only considered matching if project, deployment, source_id, and offset all match

    • if False [default], deployments from any project are matched by name only;

      embeddings are matched across all deployments and projects by source_id and offset only

  • **dataloader_kwargs – additional keyword arguments to pass to the dataloader

Returns:

(embedding_db, dict with info about inserted window_id’s and failed samples)

Effects:

Inserts embeddings into the provided hoplite database Adds deployment and recording entries to db as needed

eval(targets=None, scores=None, reset_metrics=True)[source]

compute single-target or multi-target metrics from targets and scores

Or, compute metrics on accumulated values in the TorchMetrics if targets is None

By default, the overall model score is “map” (mean average precision) for multi-target models (self.single_target=False) and “f1” (average of f1 score across classes) for single-target models).

update self.torch_metrics to include the desired metrics

Parameters:
  • targets – 0/1 for each sample and each class

  • None (- if targets is) – (using accumulated values)

  • self.torch_metrics (runs metric.compute() on each of) – (using accumulated values)

  • scores – continuous values in 0/1 for each sample and class

  • None

  • ignored (this is)

  • reset_metrics – if True, resets the metrics after computing them [default: True]

Returns:

value)

Return type:

dictionary of metrics (name

Raises:

AssertionError – if targets are outside of range [0,1]

generate_cams(samples, method='gradcam', classes=None, target_layers=None, guided_backprop=False, progress_bar=True, audio_root=None, **dataloader_kwargs)[source]

Generate a activation and/or backprop heatmaps for each sample

Parameters:
  • samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a file path (str or Path) to a single audio file, OR - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths

  • method

    method to use for activation map. Can be str (choose from below) or a class of pytorch_grad_cam (any subclass of BaseCAM), or None if None, activation maps will not be created [default:’gradcam’]

    str can be any of the following:

    ”gradcam”: pytorch_grad_cam.GradCAM, “hirescam”: pytorch_grad_cam.HiResCAM, “scorecam”: pytorch_grad_cam.ScoreCAM, “gradcam++”: pytorch_grad_cam.GradCAMPlusPlus, “ablationcam”: pytorch_grad_cam.AblationCAM, “xgradcam”: pytorch_grad_cam.XGradCAM, “eigencam”: pytorch_grad_cam.EigenCAM, “eigengradcam”: pytorch_grad_cam.EigenGradCAM, “layercam”: pytorch_grad_cam.LayerCAM, “fullgrad”: pytorch_grad_cam.FullGrad, “gradcamelementwise”: pytorch_grad_cam.GradCAMElementWise,

  • classes (list) – list of classes, will create maps for each class [default: None] if None, creates an activation map for the highest scoring class on a sample-by-sample basis

  • target_layers (list) –

    list of target layers for GradCAM - if None [default] attempts to use architecture’s default target_layer Note: only architectures created with opensoundscape 0.9.0+ will have a default target layer. See pytorch_grad_cam docs for suggestions. Note: if multiple layers are provided, the activations are merged across

    layers (rather than returning separate activations per layer)

  • guided_backprop – bool [default: False] if True, performs guided backpropagation for each class in classes. AudioSamples will have attribute .gbp_maps, a pd.Series indexed by class name

  • audio_root – str or Path, root directory to prepend to audio file paths in samples, if samples do not contain full paths. [default: None]

  • SafeAudioDataloader (**kwargs are passed to) – (incl: batch_size, num_workers, split_file_into_clips, bypass_augmentations, invalid_sample_behavior, overlap_fraction, final_clip, other DataLoader args)

Returns:

a list of AudioSample objects with .cam attribute, an instance of the CAM class ( visualize with sample.cam.plot()). See the CAM class for more details

See pytorch_grad_cam documentation for references to the source of each method.

generate_samples(samples, invalid_samples_log=None, return_invalid_samples=False, audio_root=None, **dataloader_kwargs)[source]

Generate AudioSample objects. Input options same as .predict()

Parameters:
  • samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path as str or pathlib.Path

  • args (see .predict() documentation for other)

  • **dataloader_kwargs – any arguments to inference_dataloader_cls.__init__ except samples (uses samples) and collate_fn (uses identity) (Note: default class is SafeAudioDataloader)

Returns:

a list of AudioSample objects - if return_invalid_samples is True, returns second value: list of paths to samples that failed to preprocess

Example: ` from opensoundscappe.preprocess.utils import show_tensor_grid samples = generate_samples(['/path/file1.wav','/path/file2.wav']) tensors = [s.data for s in samples] show_tensor_grid(tensors,columns=3) `

classmethod load(path, unpickle=True)[source]

load a model saved using CNN.save()

Parameters:
  • path – path to file saved using CNN.save()

  • unpickle – if True, passes weights_only=False to torch.load(). This is necessary if the model was saved with pickle=True, which saves the entire model object. If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]

Returns:

new CNN instance

Note: Note that if you used pickle=True when saving, the model object might not load properly across different versions of OpenSoundscape.

load_weights(path, strict=True)[source]

load network weights state dict from a file

For instance, load weights saved with .save_weights() in-place operation

Parameters:
  • path – file path with saved weights

  • strict – (bool) see torch.load()

log_file

specify a path to save output to a text file

logging_level

amount of logging to self.log_file. 0 for nothing, 1,2,3 for increasing logged info

loss_hist

list of batch loss values during training

name = 'SpectrogramClassifier'
per_class_metrics(targets, scores)[source]

compute per-class metrics: au_roc, avg precision

can override this method to customize per-class metrics

Parameters:
  • targets – 2d array of 0/1 for each sample and each class

  • scores – 2d array of continuous valued score for each sample and class

Returns:

dictionary of per-class metrics

{class_name: {metric_name: value}}

predict(samples, batch_size=1, num_workers=0, activation_layer=None, clip_overlap=None, overlap_fraction=None, clip_step=None, final_clip='extend', bypass_augmentations=True, invalid_samples_log=None, raise_errors=False, wandb_session=None, return_invalid_samples=False, progress_bar=True, audio_root=None, output_size_warning=1000000000.0, **dataloader_kwargs)[source]

Generate predictions on a set of samples

Return dataframe of model output scores for each sample. Optional activation layer for scores (softmax, sigmoid, softmax then logit, or None)

Parameters:
  • samples – the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path (str or pathlib.Path)

  • batch_size – Number of files to load simultaneously [default: 1]

  • num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]

  • activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw logit scores [-inf:inf] - ‘softmax’: scores all classes sum to 1, scores between 0 and 1 - ‘sigmoid’: each class is independent, scores between 0 and 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]

  • overlap_fraction – see opensoundscape.utils.generate_clip_times_df

  • clip_overlap – see opensoundscape.utils.generate_clip_times_df

  • clip_step – see opensoundscape.utils.generate_clip_times_df

  • final_clip – see opensoundscape.utils.generate_clip_times_df

  • bypass_augmentations – If False, Actions with is_augmentation==True are performed. Default True.

  • invalid_samples_log – if not None, samples that failed to preprocess will be listed in this text file.

  • raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log

  • wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb

  • return_invalid_samples – bool, if True, returns second argument, a set containing file paths of samples that caused errors during preprocessing [default: False]

  • progress_bar – bool, if True, shows a progress bar with tqdm [default: True]

  • audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files

  • output_size_warning – int, if >0, raises a warning if the number of output scores (clips * classes) exceeds this number, as this can cause heavy memory usage. Set to None or 0 to disable. [default: 1e9]

  • **dataloader_kwargs – additional arguments to self.predict_dataloader()

Returns:

df of post-activation_layer scores - if return_invalid_samples is True, returns (df,invalid_samples) where invalid_samples is a set of file paths that failed to preprocess

Effects:

(1) wandb logging If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of samples is preprocessed and logged to a table. Progress over all batches is logged. Afte prediction, top scoring samples are logged. Use self.wandb_logging dictionary to change the number of samples logged or which classes have top-scoring samples logged.

(2) unsafe sample logging If unsafe_samples_log is not None, saves a list of all file paths that failed to preprocess in unsafe_samples_log as a text file

Note: if loading an audio file raises a PreprocessingError, the scores

for that sample will be np.nan

profile(samples, batch_size=1, num_workers=0, forward=True, backward=True, bypass_augmentations=False, **dataloader_kwargs)[source]

Profile the model preprocessing, forward, and backward speeds on a set of samples

Parameters:
  • samples – (same as CNN.predict()) the files to generate predictions for. Can be: - a file path (str or Path) to a single audio file, OR - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths

  • batch_size – number of samples to process simultaneously

  • num_workers – number of parallel CPU tasks for preprocessing

  • forward – bool, if True, profiles forward pass time [default: True]

  • backward – bool, if True, profiles backward pass time [default: True]

  • bypass_augmentations – bool, if True, bypasses data augmentations during preprocessing [default: False]

  • **dataloader_kwargs – additional keyword arguments to pass to the dataloader

Returns:

  • breakdown of time spent on each preprocessing step

    (measured for one sample)

  • preprocessing time per batch and per sample (seconds)

If forward=True: - forward pass time per batch and per sample (seconds) If backward=True: - backward pass time per batch and per sample (seconds)

Return type:

a dictionary with timing information for

Example: `python m=opso.CNN('resnet18',[0,1],1,32000) # m.device='cpu' # optionally set a specific device m.network.to(m.device) samples = opso.utils.make_clip_df([opso.birds_path]*10,clip_duration=1) results_dict = m.profile(samples,batch_size=32,num_workers=0) `

run_evaluation(validation_df, progress_bar=True, **kwargs)[source]

Generate predictions on labeled data and compute evaluation metrics

override this to customize the validation step during training eg, could run validation on multiple datasets and save performance of each in self.valid_metrics[current_epoch][validation_dataset_name]

Parameters:
  • validation_df – dataframe of validation samples

  • progress_bar – if True, show a progress bar with tqdm

  • **kwargs – passed to self.predict_dataloader()

Returns:

dictionary of evaluation metrics calculated with self.torch_metrics

Return type:

metrics

Effects:

updates self.valid_metrics[current_epoch] with metrics for the current epoch

property sample_duration
property sample_rate
save(path, save_hooks=False, pickle=False, error='raise')[source]

save model with weights using torch.save()

load from saved file with cnn.load_model(path)

Parameters:
  • path – file path for saved model object

  • save_hooks – retain forward and backward hooks on modules [default: False] Note: True can cause issues when using wandb.watch()

  • pickle – if True, saves the entire model object using torch.save() Note: if using pickle=True, entire object is pickled, which means that saving and loading model objects across OpenSoundscape versions might not work properly. pickle=True is useful for resuming training, because it retains the state of the optimizer, scheduler, loss function, etc pickle=False is recommended for saving models for inference/deployment/sharing [default: False]

  • error – behavior if saving with pickle fails - “raise”: raise RuntimeError - “warn”: issue a warning and save unpickled model instead - “ignore”: no action (model not saved) [default: “raise”]

save_onnx(path, activation_layer=None, include_preprocessor_output=True, include_embedding_output=True, include_classifier_output=True, **kwargs)[source]

Export the model to ONNX format

The preprocessor must be a TorchSpectrogramPreprocessor with torch.nn.Modules in preprocessor.pipeline[‘transform’].transforms (see example below)

See also: to_onnx_model() to create opensoundscape.ONNXModel for inference

Requires that onnx, onnxruntime, and onnxscript are packages are installed

Parameters:
  • path – file path to save the ONNX model pass None to return an in-memory torch.onnx.ONNXProgram object without saving to disk

  • activation_layer – if provided, applies an activation layer to classifier outputs options: ‘softmax’, ‘sigmoid’, or None [default: None]

  • include_preprocessor_output – if True, includes the output of the preprocessor in the ONNX model outputs as key “preprocessor” [default: True]

  • include_embedding_output – if True, includes the output of the embedding layer in the ONNX model outputs as key “embedding” [default: True]

  • include_classifier_output – if True, includes the output of the classifier in the ONNX model outputs as key “classifier” [default: True]

  • **kwargs – additional keyword arguments passed to opensoundscape.ml.export.to_onnx_program()

Returns:

a torch.onnx.ONNXProgram object

Return type:

onnx_program

Example:

Exporting an EfficientNet model to ONNX format: ```python from opensoundscape import CNN, preprocessors

model = CNN(

architecture=”efficientnet_b0”, classes=[0, 1, 2, 3], sample_duration=3, preprocessor_cls=preprocessors.TorchSpectrogramPreprocessor, sample_rate=32000,

) onnx_program = model.save_onnx(“./opso_efficientnet.onnx”) ```

Using the saved model for inference with onnx runtime:

```python import onnx, onnxruntime import numpy as np

combined_model = onnx.load(“opso_efficientnet.onnx”) output_names = [node.name for node in combined_model.graph.output]

onnx.checker.check_model(combined_model)

EP_list = [“CPUExecutionProvider”] # [“CUDAExecutionProvider”, “CPUExecutionProvider”] ort_session = onnxruntime.InferenceSession(“opso_efficientnet.onnx”, providers=EP_list)

# make up some random inputs audio_samples_per_input = (

combined_model.graph.input[0].type.tensor_type.shape.dim[2].dim_value

) batch_size = 3 input_batched = np.random.rand(batch_size, 1, audio_samples_per_input).astype(

np.float32

)

# compute ONNX Runtime output prediction ort_inputs = {ort_session.get_inputs()[0].name: input_batched} ort_outs = ort_session.run(None, ort_inputs)

# restore the name-value dictionary mapping of outputs outs_dict = {name: ort_outs[i] for i, name in enumerate(output_names)} print(f”shape of outputs for inference on one batch of batch size {batch_size}:”) print({k: v.shape for k, v in outs_dict.items()}) ```

Example 2: Exporting a model with customized preprocessing transforms

```python from opensoundscape import CNN, preprocessors

model = CNN(

architecture=”efficientnet_b0”, classes=[0, 1, 2, 3], sample_duration=3, preprocessor_cls=preprocessors.TorchSpectrogramPreprocessor, sample_rate=32000, bandpass_range=(3000, 10000), lower_dB_range=-30, rescale_mean_sd=(-30, 20), spec_nfft=512, spec_window_length=512, spec_hop_length=128, # resize_ft=(200, 512), # using resize_ft breaks serialization for json save/load! n_mels=64,

) onnx_program = model.save_onnx(“./opso_efficientnet_melspec.onnx”) ```

Example 3: Writing a custom list of preprocessing transforms

```python import torchaudio from opensoundscape import CNN, preprocessors model = CNN(“resnet18”, classes=[0], sample_duration=5, sample_rate=32000) # custom list of torchaudio and torchvision transforms my_transforms = [

torchaudio.transforms.Spectrogram(

n_fft=512, win_length=512, hop_length=128, center=False, #highly recommended because default=True will zero-pad, creating extra columns

), torchaudio.transforms.AmplitudeToDB(top_db=80),

] model.preprocessor = preprocessors.TorchSpectrogramPreprocessor(

sample_rate=32000, sample_duration=model.preprocessor.sample_duration, torch_transforms=my_transforms,

) onnx_program = model.save_onnx(“./opso_efficientnet_custom.onnx”) ```

save_weights(path)[source]

save just the weights of the network

This allows the saved weights to be used more flexibly than model.save() which will pickle the entire object. The weights are saved in a pickled dictionary using torch.save(self.network.state_dict())

Parameters:

path – location to save weights file

similarity_search_hoplite_db(query_samples, db, num_results=5, exact_search=False, search_subset_size=None, target_score=None, audio_root=None, search_kwargs=None, **embedding_kwargs)[source]

Perform a similarity search in the Hoplite database.

Parameters:
  • query_samples – audio examples for which to find most similar examples file path, list of paths, or dataframe with file,start_time,end_time multi-index

  • db – a Hoplite database containing embeddings from the same model

  • num_results – The number of results to return for each query

  • exact_search – default False for usearch (faster), if True uses brute force search

  • search_subset_size – Number of embeddings to compare with. If None, all embeddings are used. For floats between 0 and 1, sample a proportion of the database. For ints, sample the specified number of embeddings. if None [default], searches all embeddings Note: only implemented for exact_search=True

  • target_score – if specified, searches for similarity scores close to target_score default [None] searches for most similar embeddings

  • audio_root – root directory for relative paths to query audio files

  • search_kwargs – dict of additional keyword arguments passed to db.ui.search() or brutalism.threaded_brute_search() if exact_search=True exact_search=False: radius, threads, exact, log, progress exact_search=True: batch_size, max_workers, rng_seed

  • **embedding_kwargs – additional keyword arguments passed to self.embed(), such as batch_size and num_workers

Returns:

  • query_file, query_start_time, query_end_time: the query sample info

  • file, window_id: the matched sample filepath and window_id from the database

  • start_time, end_time: the matched sample start and end time (relative to file) from the database

  • sort_score: the similarity score between the query and matched sample

Return type:

A dataframe with the search results, including columns

train(train_df, validation_df=None, steps=1000, batch_size=64, num_workers=0, save_path='.', save_interval=-1, log_interval=50, validation_interval=100, reset_optimizer=False, restart_scheduler=False, invalid_samples_log='./invalid_training_samples.log', raise_errors=False, wandb_session=None, progress_bar=True, audio_root=None, reload_best_at_end=True, **dataloader_kwargs)[source]

train the model on samples from train_dataset

If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().

Parameters:
  • train_df – a dataframe of files and labels for training the model - either has index file or multi-index (file,start_time,end_time)

  • validation_df – a dataframe of files and labels for evaluating the model [default: None means no validation is performed]

  • steps – number of steps (ie batches or updates) to train the model for

  • batch_size – number of training files simultaneously passed through forward pass, loss function, and backpropagation

  • num_workers – number of parallel CPU tasks for preprocessing Note: use 0 for single (root) process (not 1)

  • save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]

  • save_interval – interval in steps to save model object with weights Note: the best model is always saved to best.model [default:-1] means only save best.model and last.model in addition to other saved steps.

  • log_interval – interval in batches to print training loss/metrics

  • validation_interval – interval in steps to test the model on the validation set Note that model will only update it’s best score and save best.model file on steps that it performs validation.

  • reset_optimizer – if True, resets the optimizer rather than retaining state_dict of self.optimizer [default: False]

  • restart_scheduler – if True, resets the learning rate scheduler rather than retaining state_dict of self.scheduler [default: False]

  • invalid_samples_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file

  • raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log

  • wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb For example: ` import wandb wandb.login(key=api_key) #find your api_key at https://wandb.ai/settings session = wandb.init(enitity='mygroup',project='project1',name='first_run') ... model.train(...,wandb_session=session) session.finish() `

  • audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files

  • progress_bar – bool, if True, shows a progress bar with tqdm [default: True]

  • reload_best_at_end – if True, after training completes, reloads the best model weights into self.network [default: True] Best model is determined by validation set’s self.score_metric score

  • **dataloader_kwargs – additional arguments passed to train_dataloader()

Effects:

If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of training and validation samples are preprocessed and logged to a table. Training progress, loss, and metrics are also logged. Use self.wandb_logging dictionary to change the number of samples logged.

verbose

amount of logging to stdout. 0 for nothing, 1,2,3 for increasing printed output

class opensoundscape.ml.cnn.SpectrogramModule(architecture, classes, sample_duration, sample_rate, single_target=False, preprocessor_dict=None, preprocessor_cls=<class 'opensoundscape.preprocess.preprocessors.SpectrogramPreprocessor'>, arch_weights='DEFAULT', **preprocessor_kwargs)[source]

Bases: BaseModule

Parent class for SpectrogramClassifier (pytorch) and LightningSpectrogramModule (lightning)

implements functionality that is shared between both pure PyTorch and Lightning classes/workflows

change_classes(new_classes, hidden_layers=None)[source]

change the classes that the model predicts

replaces the network’s final linear classifier layer with a new layer (or MLP, if hidden_layers is not None) initialized with random weights and the correct number of output features

Supports torch.nn.Linear and opensoundscape.ml.shallow_classifier.MLPClassifier as the classifier layer to update. Will raise an error if self.network.classifier_layer is a different type

Parameters:
  • new_classes – list of class names

  • hidden_layers

    list of hidden layer sizes for the new classifier - None: creates a single torch.nn.Linear layer - (int, …): creates an MLPClassifier object with hidden layers

    of the specified sizes; eg (100, 50) creates 2 hidden layers with 100 and 50 neurons, respectively.

    • (): empty tuple creates an MLPClassifier with no hidden layers

change_classifier(new_classifier, classes=None)[source]

Replaces the classifier layer

replaces the network’s final linear classifier layer with a new classifier

Parameters:
  • new_classifier – the new classifier to replace the existing one typically, torch.nn.Linear or opensoundscape.ml.shallow_classifier.MLPClassifier object

  • classes – optional list of class names to set for the new classifier; if None, will attempt to copy from new_classifier.classes attribute

property classifier

return the classifier layer of the network, based on .network.classifier_layer string

compute_per_class_metrics

if True, compute and log per-class metrics during training/validation

freeze_feature_extractor()[source]

freeze all layers except self.classifier

prepares the model for transfer learning where only the classifier is trained

uses the attribute self.network.classifier_layer (via the .classifier attribute) to identify the classifier layer

if this is not set will raise Exception - use freeze_layers_except() instead

freeze_layers_except(train_layers=None)[source]

Freeze all parameters of a model except the parameters in the target_layer(s)

Freezing parameters means that the optimizer will not update the weights

Modifies the model in place!

Parameters:
  • model – the model to freeze the parameters of

  • train_layers – layer or list/iterable of the layers whose parameters should not be frozen For example: pass model.classifier to train only the classifier

Example 1: ` freeze_all_layers_except(model, model.classifier) `

Example 2: freeze all but 2 layers ` freeze_all_layers_except(model, [model.layer1, model.layer2]) `

lr_scheduler_step

track number of calls to lr_scheduler.step()

set to -1 to restart learning rate schedule from initial lr

this value is used to initialize the lr_scheduler’s last_epoch parameter it is tracked separately from self.current_step because the lr_scheduler might be stepped once per epoch, per step, or at other intervals

Note that the initial learning rate is set via self.optimizer_params[‘kwargs’][‘lr’]

network

a pytorch Module such as Resnet18 or a custom object

for convenience, __init__ also allows user to provide string matching a key from opensoundscape.ml.cnn_architectures.ARCH_DICT.

List options: opensoundscape.ml.cnn_architectures.list_architectures()

property single_target
unfreeze()[source]

Unfreeze all layers & parameters of self.network

Enables gradient updates for all layers & parameters

Modifies the object in place

opensoundscape.ml.cnn.get_channel_dim(model)[source]
opensoundscape.ml.cnn.list_model_classes()[source]

return list of available action function keyword strings (can be used to initialize Action class)

opensoundscape.ml.cnn.load_model(path, device=None, unpickle=True)[source]

load a saved model object

This function handles models saved either as pickled objects or as a dictionary including weights, preprocessing parameters, architecture name, etc.

Note that pickled objects may not load properly across different versions of OpenSoundscape, while the dictionary format does not retain the full training state for resuming model training.

Parameters:
  • path – file path of saved model

  • device – which device to load into, eg ‘cuda:1’ [default: None] will choose first gpu if available, otherwise cpu

  • unpickle – if True, passes weights_only=False to torch.load(). This is necessary if the

  • with`pickle=True` (model was saved) – If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]

  • object. (which saves the entire model) – If unpickle=False, this function will work if the model was saved with pickle=False, but will raise an error if the model was saved with pickle=True. [default: True]

Returns:

a model object with loaded weights

opensoundscape.ml.cnn.register_model_cls(model_cls)[source]

add class to MODEL_CLS_DICT

this allows us to recreate the class when loading saved model file with load_model()

opensoundscape.ml.cnn.use_resample_loss(model, train_df)[source]

Modify a model to use ResampleLoss for multi-target training

ResampleLoss may perform better than BCE Loss for multitarget problems in some scenarios.

Parameters:
  • model – CNN object

  • train_df – dataframe of labels, used to calculate class frequency

opensoundscape.ml.cnn_architectures module

Module to initialize PyTorch CNN architectures with custom output shape

This module allows the use of several built-in CNN architectures from PyTorch. The architecture refers to the specific layers and layer input/output shapes (including convolution sizes and strides, etc) - such as the ResNet18 or EfficientNet B0 architecture.

We provide wrappers which modify the output layer to the desired shape (to match the number of classes). The way to change the output layer shape depends on the architecture, which is why we need a wrapper for each one. This code is based on pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

To use these wrappers, for example, if your model has 10 output classes, write

my_arch=resnet18(10)

Then you can initialize a model object from opensoundscape.ml.cnn with your architecture:

model=CNN(my_arch,classes,sample_duration, sample_rate)

or override an existing model’s architecture:

model.network = my_arch

opensoundscape.ml.cnn_architectures.alexnet(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for AlexNet architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.change_conv2d_channels(conv2d, num_channels=3, reuse_weights=True)[source]

Modify the number of input channels for a pytorch CNN

This function changes the input shape of a torch.nn.Conv2D layer to accommodate a different number of channels. It attempts to retain weights in the following manner: - If num_channels is less than the original, it will average weights across the original channels and apply them to all new channels. - if num_channels is greater than the original, it will cycle through the original channels, copying them to the new channels

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

  • reuse_weights – if True (default), averages (if num_channels<original)

  • through (or cycles) – and adds them to the new Conv2D

opensoundscape.ml.cnn_architectures.change_fc_output_size(fc, num_classes)[source]

Modify the number of output nodes of a fully connected layer

Parameters:
  • fc – the fully connected layer of the model that should be modified

  • num_classes – number of output nodes for the new fc

opensoundscape.ml.cnn_architectures.densenet121(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for densenet121 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.efficientnet_b0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for efficientnet_b0 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

Note: in v0.10.2, changed from using NVIDIA/DeepLearningExamples:torchhub repo

implementatiuon to native pytorch implementation

opensoundscape.ml.cnn_architectures.efficientnet_b1(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for efficientnet_b1 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.efficientnet_b4(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for efficientnet_b4 architecture

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

Note: in v0.10.2, changed from using NVIDIA/DeepLearningExamples:torchhub repo

implementatiuon to native pytorch implementation

opensoundscape.ml.cnn_architectures.freeze_params(model)[source]

disable gradient updates for all model parameters

opensoundscape.ml.cnn_architectures.generic_make_arch(constructor, weights, num_classes, embed_layer, cam_layer, name, input_conv2d_layer, linear_clf_layer, freeze_feature_extractor=False, num_channels=3)[source]

construct a CNN architecture, then adapt the input channels and output layer according to channels and num_classes arguments

works when first layer is conv2d and last layer is fully-connected Linear

input_size = 224

Parameters:
  • constructor – function that creates a torch.nn.Module and takes weights argument

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html Passed to constructor()

  • num_classes – number of output nodes for the final layer

  • embed_layer – specify which layers outputs should be accessed for “embeddings”

  • cam_layer – specify a default layer for GradCAM/etc visualizations

  • name – name of the architecture, used for the constructor_name attribute to re-load from saved version

  • input_conv2d_layer – name of first Conv2D layer that can be accessed with .get_submodule() string formatted as .-delimited list of attribute names or list indices, e.g. “features.0”

  • linear_clf_layer – name of final Linear classification fc layer that can be accessed with .get_submodule() string formatted as .-delimited list of attribute names or list indices, e.g. “classifier.0.fc”

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.list_architectures()[source]

return list of available architecture keyword strings

opensoundscape.ml.cnn_architectures.register_arch(func)[source]

add architecture to ARCH_DICT

opensoundscape.ml.cnn_architectures.resnet101(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet101 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet152(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet152 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet18(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet18 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet34(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet34 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.resnet50(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for ResNet50 architecture

input_size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.set_layer_from_name(module, layer_name, new_layer)[source]

assign an attribute of an object using a string name

Parameters:
  • module – object to assign attribute to

  • layer_name – string name of attribute to assign the attribute_name is formatted with . delimiter and can contain either attribute names or list indices e.g. “network.classifier.0.0.fc” sets network.classifier[0][0].fc this type of string is given by torch.nn.Module.named_modules()

  • new_layer – replace layer with this torch.nn.Module instance

  • also (see) – torch.nn.Module.named_modules(), torch.nn.Module.get_submodule()

opensoundscape.ml.cnn_architectures.squeezenet1_0(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for squeezenet architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

  • num_channels – specify channels in input sample, eg [channels h,w] sample shape

opensoundscape.ml.cnn_architectures.unfreeze_params(model)[source]

enable gradient updates for all model parameters

opensoundscape.ml.cnn_architectures.vgg11_bn(num_classes, freeze_feature_extractor=False, weights='DEFAULT', num_channels=3)[source]

Wrapper for vgg11 architecture

input size = 224

Parameters:
  • num_classes – number of output nodes for the final layer

  • freeze_feature_extractor – if False (default), entire network will have gradients and can train if True, feature block is frozen and only final layer is trained

  • weights – string containing version name of the pre-trained classification weights to use for this architecture. if ‘DEFAULT’, model is loaded with best available weights (note that these may change across versions). Pre-trained weights available for each architecture are listed at https://pytorch.org/vision/stable/models.html

opensoundscape.ml.dataloaders module

class opensoundscape.ml.dataloaders.SafeAudioDataloader(*args: Any, **kwargs: Any)[source]

Bases: DataLoader

Create DataLoader for inference, wrapping a SafeDataset

SafeDataset contains AudioFileDataset or AudioSampleDataset depending on sample type

During inference, we allow the user to pass any of these formatas for samples: - list of file paths - Dataframe with file as index - Dataframe with (file, start_time, end_time) of clips as MultiIndex - Dataframe with (file, start_time, end_time) as columns - Dataframe with (file, start_time) as column - Dataframe with (file) as column - CategoricalLabels object

If start_times are not specified, it will automatically determine the number of clips that can be created from the file (with overlap between subsequent clips based on overlap_fraction)

Parameters:
  • samples – any of the following: - list of file paths - Dataframe with file, start_time, end_time of clips as index - Dataframe with (file, start_time, end_time) as columns - Dataframe with (file, start_time) as columns - Dataframe with file as index - Dataframe with (file) as column - CategoricalLabels object

  • preprocessor – preprocessor object, eg AudioPreprocessor or SpectrogramPreprocessor

  • overlap_fraction – see opensoundscape.utils.generate_clip_times_df

  • clip_overlap – see opensoundscape.utils.generate_clip_times_df

  • clip_step – see opensoundscape.utils.generate_clip_times_df

  • final_clip – see opensoundscape.utils.generate_clip_times_df

  • bypass_augmentations – if True, don’t apply any augmentations [default: True]

  • invalid_sample_behavior – how to handle samples that fail to preprocess, one of “substitute”, “placeholder”, “raise”, or “none” - “substitute”: pick another sample - “placeholder”: return a placeholder value (zeros) for the sample - “raise”: raise the error - “none”: return None

  • collate_fn

    function to collate list of AudioSample objects into batches if None, uses collate_fn=collate_audio_samples to return

    a tuple of (data, labels) tensors

    default is identity, which returns list of AudioSample objects (no collation)

  • audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), samples must contain full paths to files

  • **kwargs – any arguments to torch.utils.data.DataLoader

Returns:

DataLoader that returns lists of AudioSample objects when iterated (if collate_fn is identity)

preprocessor

do not override or modify this attribute, as it will have no effect

samples

do not override or modify this attribute, as it will have no effect

opensoundscape.ml.dataloaders.collate_audio_samples(samples)[source]

generate batched tensors of data and labels from list of AudioSample

assumes that s.data is a Tensor and s.labels is a list/array for each item in samples, and that every sample has labels for the same classes.

Parameters:

samples – iterable of AudioSample objects (or other objects with attributes .data as Tensor and .labels as list/array)

Returns:

(samples, labels) tensors of shape (batch_size, *) & (batch_size, n_classes)

opensoundscape.ml.dataloaders.collate_audio_samples_to_dict(samples)[source]

generate batched tensors of data and labels (in a dictionary). returns collated samples: a dictionary with keys “samples” and “labels”

assumes that s.data is a Tensor and s.labels is a list/array for each sample S, and that every sample has labels for the same classes.

Parameters:
  • samples – iterable of AudioSample objects (or other objects

  • list/array) (with attributes .data as Tensor and .labels as)

Returns:

dictionary of {

“samples”:batched tensor of samples, “labels”: batched tensor of labels,

}

opensoundscape.ml.datasets module

Preprocessors: pd.Series child with an action sequence & forward method

class opensoundscape.ml.datasets.AudioFileDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

Base class for audio datasets with OpenSoundscape (use in place of torch Dataset)

Custom Dataset classes should subclass this class or its children.

Datasets in OpenSoundscape contain a Preprocessor object which is responsible for the procedure of generating a sample for a given input. The DataLoader handles a dataframe of samples (and potentially labels) and uses a Preprocessor to generate samples from them.

Parameters:
  • samples

    the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index of (path,start_time,end_time) per clip, OR - a list or np.ndarray of audio file paths

    Notes for input dataframe:
    • df must have audio paths in the index.

    • If label_df has labels, the class names should be the columns, and

    the values of each row should be 0 or 1.
    • If data does not have labels, label_df will have no columns

  • preprocessor – an object of BasePreprocessor or its children which defines the operations to perform on input samples

  • bypass_augmentations – if True, skips Actions with .is_augmentation=True

  • audio_root – optionally pass a root directory (pathlib.Path or str) to prepend to each file path - if None (default), samples must contain full paths to files

  • **kwargs – passed to make_clip_df via _ingest_samples_argument

Returns:

sample (AudioSample object)

Raises:

PreprocessingError if exception is raised during __getitem__

Effects:
self.invalid_samples will contain a set of paths that did not successfully

produce a list of clips with start/end times

audio_root

path to prepend to all audio file paths when loading

bypass_augmentations

if True, skips Actions with .is_augmentation=True

class_counts()[source]

count number of each label

classes

list of classes to which multi-hot labels correspond

classmethod from_categorical_df(categorical_labels, preprocessor, class_list, bypass_augmentations=False)[source]

Create AudioFileDataset from a DataFrame with a column listing categorical labels

e.g. where df[‘labels’] = [[‘a’,’b’], [], [‘a’,’c’]]

Parameters:
  • categorical_labels – DataFrame with index (file) or (file, start_time, end_time) and ‘label’ column containing lists of labels or integers corresponding to class names

  • preprocessor – Preprocessor object

  • bypass_augmentations – if True, skip augmentations with .is_augmentation=True

Returns:

AudioFileDataset object

head(n=5)[source]

out-of-place copy of first n samples

performs df.head(n) on self.label_df

Parameters:
  • n – number of first samples to return, see pandas.DataFrame.head()

  • [default – 5]

Returns:

a new dataset object

invalid_samples

set of file paths that raised exceptions during preprocessing

label_df

dataframe containing file paths, clip times, and multi-hot labels (one column per class)

preprocessor

Preprocessor object containing a .pipeline of ordered preprocessing operations

sample(**kwargs)[source]

out-of-place random sample

creates copy of object with n rows randomly sampled from label_df

Args: see pandas.DataFrame.sample()

Returns:

a new dataset object

class opensoundscape.ml.datasets.EmbeddingDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

simple dataset wrapper for embedding features and labels

Parameters:
  • features – tensor or np.array of input features first dimension should be samples

  • labels – tensor or np.array of target labels first dimension should be samples

class opensoundscape.ml.datasets.HopliteDataset(*args: Any, **kwargs: Any)[source]

Bases: Dataset

Dataset that retrieves embeddings from a HopliteDB for given files and start times

property label_df
exception opensoundscape.ml.datasets.InvalidIndexError[source]

Bases: Exception

exception opensoundscape.ml.datasets.NoMatchingWindowIDsError[source]

Bases: Exception

opensoundscape.ml.lightning module

class opensoundscape.ml.lightning.LightningSpectrogramModule(*args: Any, **kwargs: Any)[source]

Bases: SpectrogramModule, LightningModule

fit_with_trainer(train_df, validation_df=None, epochs=1, batch_size=1, num_workers=0, save_path='.', invalid_samples_log='./invalid_training_samples.log', raise_errors=False, wandb_session=None, checkpoint_path=None, **kwargs)[source]

train the model on samples from train_dataset

If customized loss functions, networks, optimizers, or schedulers are desired, modify the respective attributes before calling .train().

Parameters:
  • train_df – a dataframe of files and labels for training the model - either has index file or multi-index (file,start_time,end_time)

  • validation_df – a dataframe of files and labels for evaluating the model [default: None means no validation is performed]

  • batch_size – number of training files simultaneously passed through forward pass, loss function, and backpropagation

  • num_workers – number of parallel CPU tasks for preprocessing Note: use 0 for single (root) process (not 1)

  • save_path – location to save intermediate and best model objects [default=”.”, ie current location of script]

  • save_interval – interval in epochs to save model object with weights [default:1] Note: the best model is always saved to best.model in addition to other saved epochs.

  • log_interval – interval in batches to print training loss/metrics

  • validation_interval – interval in epochs to test the model on the validation set Note that model will only update it’s best score and save best.model file on epochs that it performs validation.

  • invalid_samples_log – file path: log all samples that failed in preprocessing (file written when training completes) - if None, does not write a file

  • raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log

  • wandb_session – a wandb session to log to (Note: can also pass logger kwarg with any Lightning logger object) - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb For example: ` import wandb wandb.login(key=api_key) #find your api_key at https://wandb.ai/settings session = wandb.init(enitity='mygroup',project='project1',name='first_run') ... model.fit_with_trainer(...,wandb_session=session) session.finish() `

  • **kwargs – any arguments to pytorch_lightning.Trainer(), such as accelerator, precision, logger, accumulate_grad_batches, etc. Note: the max_epochs kwarg is overridden by the epochs argument

Returns:

a trained pytorch_lightning.Trainer object

Effects:

If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of training and validation samples are preprocessed and logged to a table. Training progress, loss, and metrics are also logged. Use self.wandb_logging dictionary to change the number of samples logged.

forward(samples)[source]

standard Lightning method defining action to take on each batch for inference

typically returns logits (raw, untransformed model outputs)

load_weights(path, strict=True)[source]

load network weights state dict from a file

For instance, load weights saved with .save_weights() in-place operation

Parameters:
  • path – file path with saved weights

  • strict – (bool) see torch.Module.load_state_dict()

predict_with_trainer(samples, batch_size=1, num_workers=0, activation_layer=None, clip_overlap=None, overlap_fraction=None, clip_step=None, final_clip='extend', bypass_augmentations=True, invalid_samples_log=None, raise_errors=False, return_invalid_samples=False, lightning_trainer_kwargs=None, dataloader_kwargs=None)[source]

Generate predictions on a set of samples

Return dataframe of model output scores for each sample. Optional activation layer for scores (softmax, sigmoid, softmax then logit, or None)

Parameters:
  • samples – the files to generate predictions for. Can be: - a dataframe with index containing audio paths, OR - a dataframe with multi-index (file, start_time, end_time), OR - a list (or np.ndarray) of audio file paths - a single file path (str or pathlib.Path)

  • batch_size – Number of files to load simultaneously [default: 1]

  • num_workers – parallelization (ie cpus or cores), use 0 for current process [default: 0]

  • activation_layer – Optionally apply an activation layer such as sigmoid or softmax to the raw outputs of the model. options: - None: no activation, return raw scores (ie logit, [-inf:inf]) - ‘softmax’: scores all classes sum to 1 - ‘sigmoid’: all scores in [0,1] but don’t sum to 1 - ‘softmax_and_logit’: applies softmax first then logit [default: None]

  • overlap_fraction – see opensoundscape.utils.generate_clip_times_df

  • clip_overlap – see opensoundscape.utils.generate_clip_times_df

  • clip_step – see opensoundscape.utils.generate_clip_times_df

  • final_clip – see opensoundscape.utils.generate_clip_times_df

  • bypass_augmentations – If False, Actions with is_augmentation==True are performed. Default True.

  • invalid_samples_log – if not None, samples that failed to preprocess will be listed in this text file.

  • raise_errors – if True, raise errors when preprocessing fails if False, just log the errors to unsafe_samples_log

  • wandb_session – a wandb session to log to - pass the value returned by wandb.init() to progress log to a Weights and Biases run - if None, does not log to wandb

  • return_invalid_samples – bool, if True, returns second argument, a set containing file paths of samples that caused errors during preprocessing [default: False]

  • lightning_trainer_kwargs – dictionary of keyword args to pass to __call__, which are then passed to lightning.Trainer.__init__ see lightning.Trainer documentation for options. [Default: None] passes no kwargs

  • dataloader_kwargs – dictionary of keyword args to self.predict_dataloader()

Returns:

df of post-activation_layer scores - if return_invalid_samples is True, returns (df,invalid_samples) where invalid_samples is a set of file paths that failed to preprocess

Effects:

(1) wandb logging If wandb_session is provided, logs progress and samples to Weights and Biases. A random set of samples is preprocessed and logged to a table. Progress over all batches is logged. After prediction, top scoring samples are logged. Use self.wandb_logging dictionary to change the number of samples logged or which classes have top-scoring samples logged.

(2) unsafe sample logging If unsafe_samples_log is not None, saves a list of all file paths that failed to preprocess in unsafe_samples_log as a text file

Note: if loading an audio file raises a PreprocessingError, the scores

for that sample will be np.nan

save(path, save_hooks=False, weights_only=False)[source]

save model with weights using Trainer.save_checkpoint()

load from saved file with LightningSpectrogramModule.load_from_checkpoint()

Note: saving and loading model objects across OpenSoundscape versions will not work properly. Instead, use .save_weights() and .load_weights() (but note that architecture, customizations to preprocessing, training params, etc will not be retained using those functions).

For maximum flexibilty in further use, save the model with both .save() and .save_torch_dict() or .save_weights().

Parameters:
  • path – file path for saved model object

  • save_hooks – retain forward and backward hooks on modules [default: False] Note: True can cause issues when using wandb.watch()

save_weights(path)[source]

save just the weights of the network

This allows the saved weights to be used more flexibly than model.save() which will pickle the entire object. The weights are saved in a pickled dictionary using torch.save(self.network.state_dict())

Parameters:

path – location to save weights file

train(*args, **kwargs)[source]

inherit train() method from LightningModule rather than SpectrogramModule

this is just a method that sets True/False for trianing mode, it doesn’t perform training

opensoundscape.ml.loss module

loss function classes to use with opensoundscape models

class opensoundscape.ml.loss.BCELossWeakNegatives(*args: Any, **kwargs: Any)[source]

Bases: BCEWithLogitsLoss

BCEWithLogitsLoss that applies a weak negative weight to nan labels in the target.

This is different from soft labeling: we treat nan labels as negatives, then apply element-wise weighting to reduce their contribution to the loss.

Parameters:
  • weak_negative_weight – weight to apply to nan labels in target

  • **kwargs – passed to nn.BCEWithLogitsLoss

forward(x, target)[source]
class opensoundscape.ml.loss.BCEWithLogitsLoss_hot(*args: Any, **kwargs: Any)[source]

Bases: BCEWithLogitsLoss

use pytorch’s nn.BCEWithLogitsLoss for one-hot labels by simply converting y from long to float

Parameters:

**kwargs – passed to nn.BCEWithLogitsLoss

forward(x, target)[source]
class opensoundscape.ml.loss.CrossEntropyLoss_hot(*args: Any, **kwargs: Any)[source]

Bases: CrossEntropyLoss

use pytorch’s nn.CrossEntropyLoss for one-hot labels by converting labels from 1-hot to integer labels

throws a ValueError if labels are not one-hot

Parameters:

**kwargs – passed to nn.CrossEntropyLoss

forward(x, target)[source]
class opensoundscape.ml.loss.ResampleLoss(*args: Any, **kwargs: Any)[source]

Bases: Module

forward(cls_score, label, weight=None, reduction_override=None)[source]
logit_reg_functions(labels, logits, weight=None)[source]
rebalance_weight(gt_labels)[source]
reweight_functions(label)[source]
opensoundscape.ml.loss.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None)[source]

helper function for BCE loss in ResampleLoss class

opensoundscape.ml.loss.reduce_loss(loss, reduction)[source]

Reduce loss as specified.

Parameters:
  • loss (Tensor) – Elementwise loss tensor.

  • reduction (str) – Options are “none”, “mean” and “sum”.

Returns:

Reduced loss tensor.

Return type:

Tensor

opensoundscape.ml.loss.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[source]

Apply element-wise weight and reduce loss.

Parameters:
  • loss (Tensor) – Element-wise loss.

  • weight (Tensor) – Element-wise weights.

  • reduction (str) – Same as built-in losses of PyTorch.

  • avg_factor (float) – Avarage factor when computing the mean of losses.

Returns:

Processed loss values.

Return type:

Tensor

opensoundscape.ml.safe_dataset module

Dataset wrapper to handle errors gracefully in Preprocessor classes

A SafeDataset handles errors in a potentially misleading way: If an error is raised while trying to load a sample, the SafeDataset will instead load a different sample. The indices of any samples that failed to load will be stored in ._invalid_indices.

The behavior may be desireable for training a model, but could cause silent errors when predicting a model (replacing a bad file with a different file), and you should always be careful to check for ._invalid_indices after using a SafeDataset.

based on an implementation by @msamogh in nonechucks (github.com/msamogh/nonechucks/)

class opensoundscape.ml.safe_dataset.SafeDataset(dataset, invalid_sample_behavior)[source]

Bases: object

A wrapper for a Dataset that handles errors when loading samples

WARNING: When iterating, will skip the failed sample, but when using within a DataLoader, finds the next good sample and uses it for the current index (see __getitem__).

Note that this class does not subclass DataSet. Instead, it contains a .dataset attribute that is a DataSet (or AudioFileDataset / AudioFileDataset, which subclass DataSet).

Parameters:
  • dataset – a torch Dataset instance or child such as AudioFileDataset, AudioFileDataset

  • eager_eval – If True, checks if every file is able to be loaded during initialization (logs _valid_indices and _invalid_indices)

Attributes: _valid_indices and _invalid_indices can be accessed later to check which samples raised Exceptions.

__getitem__(index)[source]

If loading an index fails, keeps trying the next index until success

_safe_get_item()[source]

Tries to load a sample, returns None if error occurs

__iter__()[source]

generator that skips samples that raise errors when loading

report(log=None)[source]

write invalid samples to log file, give warning, & return invalid samples

opensoundscape.ml.sampling module

classes for strategically sampling within a DataLoader

class opensoundscape.ml.sampling.ClassAwareSampler(*args: Any, **kwargs: Any)[source]

Bases: Sampler

In each batch of samples, pick a limited number of classes to include and give even representation to each class

class opensoundscape.ml.sampling.ImbalancedDatasetSampler(*args: Any, **kwargs: Any)[source]

Bases: Sampler

Samples elements randomly from a given list of indices for imbalanced dataset :param indices: a list of indices :type indices: list, optional :param num_samples: number of samples to draw :type num_samples: int, optional :param callback_get_label func: a callback-like function which takes two arguments:

dataset and index

Based on Imbalanced Dataset Sampling by davinnovation (https://github.com/ufoym/imbalanced-dataset-sampler)

class opensoundscape.ml.sampling.RandomCycleIter(data, test_mode=False)[source]

Bases: object

opensoundscape.ml.sampling.class_aware_sample_generator(cls_iter, data_iter_list, n, num_samples_cls=1)[source]
opensoundscape.ml.sampling.get_sampler()[source]

opensoundscape.ml.shallow_classifier module

class opensoundscape.ml.shallow_classifier.MLPClassifier(*args: Any, **kwargs: Any)[source]

Bases: Module

initialize a fully connected NN (MLP) with ReLU activations

Parameters:
  • input_size – length of 1-d tensors passed as input samples

  • output_size – number of classes at the output layer

  • hidden_layer_sizes – default () empty tuple creates a 1-layer regression classifier, specify sequence of hidden layers by the number of elements. For example (100,) creates 1 hidden layer with 100 element

  • classes (optional) – list of class names, if provided should have len=output_size - default: None

  • weights – optionally pass a pytorch weight_dict of model weights to load default None initializes the model with random weights

fit(train_features, train_labels, validation_features=None, validation_labels=None, batch_size=128, steps=1000, optimizer=None, criterion=None, device=torch.device, validation_interval=1, logging_interval=100, early_stopping_patience=None)[source]

train a PyTorch model on features and labels with batching and early stopping

Assumes all data can fit in memory. Training uses batched DataLoaders for efficient processing. If validation data is provided, the model with the lowest validation loss is automatically restored at the end of training (early stopping).

Defaults are for multi-target label problems and assume train_labels is an array of 0/1 of shape (n_samples, n_classes)

Parameters:
  • model (); generally shape (n_samples,n_features) – a torch.nn.Module object to train

  • train_features – input features for training, often embeddings; should be a valid input to

  • model

  • train_labels – labels for training, generally one-hot encoded with shape

  • (n_samples

  • criterion() (n_classes); should be a valid target for)

  • validation_features – input features for validation; if None, does not perform validation

  • validation_labels – labels for validation; if None, does not perform validation

  • batch_size – batch size for training; if fewer samples than batch_size, the entire dataset is used as a single batch [Default: 128]

  • steps – number of training steps forward/backward passes on one batch [Default: 1000]

  • optimizer – torch.optim optimizer to use; default None uses AdamW

  • criterion – loss function to use; default None uses BCELossWeakNegatives() (appropriate for

  • negatives (multi-label classification); this loss function treats NaN labels as weak)

:param : :param using a default weight of 0.01 for NaN labels compared to strong labels: :param device: torch.device to use; default is torch.device(‘cpu’); can also be e.g. :param torch.device(‘cuda: 0’) for first CUDA GPU or torch.device(‘mps’) for Mac with M1/M2 :param validation_interval: how often to validate the model during training; if validation_features :param and validation_labels are provided: :param validation is performed every validation_interval steps: :param logging_interval: how often to print training progress; progress is logged every :param logging_interval steps when validation is performed: :param early_stopping_patience: if provided and validation data is available, training will stop :param early if validation loss doesn’t improve for this many steps: :type early if validation loss doesn’t improve for this many steps: not validation evaluations :param [Default: None, which means no early stopping]

forward(x)[source]
classmethod from_torch_linear(linear_layer, classes=None)[source]

initialize an MLPClassifier from a torch.nn.Linear layer

Initializes 1-layer MLP, copying weights from linear_layer

Parameters:
  • linear_layer – a torch.nn.Linear layer whose weight and bias will be used to initialize the classifier layer

  • shape (of the MLPClassifier; should have)

  • classes (optional) – list of class names, if provided should have len=output_size default: None

classmethod load(path, **kwargs)[source]

load object saved with self.save(); **kwargs like map_location are passed to torch.load

save(path)[source]
opensoundscape.ml.shallow_classifier.augmented_embed(embedding_model, sample_df, n_augmentation_variants, batch_size=1, num_workers=0, device=torch.device, audio_root=None)[source]

Embed samples using augmentation during preprocessing

Parameters:
  • embedding_model – a model with an embed() method that takes a dataframe and returns

  • like (embeddings (e.g. a pretrained opensoundscape model or Bioacoustics Model Zoo model)

  • Perch

  • BirdNET

  • HawkEars)

  • sample_df – dataframe with samples to embed

  • n_augmentation_variants – number of augmented variants to generate for each sample

  • batch_size – batch size for embedding; default 1

  • num_workers – number of workers for embedding; default 0

  • device – torch.device to use; default is torch.device(‘cpu’)

Returns:

the embedded training samples and their labels, as torch.tensors

Return type:

x_train, y_train

opensoundscape.ml.shallow_classifier.count_dets_hoplite(db, classifier, classes, min_score=None, max_score=None, score_bins=None, batch_size=1024, date_range=None, time_range=None, deployments=None, projects=None, recordings=None, deployments_filter=None, recordings_filter=None, windows_filter=None, annotations_filter=None, progress_bar=False)[source]

Count detections in score bins/ranges based on classifier predictions and filters

Compared to select_from_hoplite, this function does not return the selected clips but just counts the number of clips in each score bin/range for each class. This can be quick and memory efficient for counting detections in large datasets if you don’t need clip info.

Parameters:
  • db – hoplite database containing embeddings classifier: MLPClassifier object or other

  • classes (classifier object to call on the torch.tensor embeddings) – list of class names to

  • None (score to filter clips by existing score in the database; if) – minimum

  • min_score (selects clips for every class in classifier) – minimum

  • None

  • min (does not threshold by)

  • max_score (score) – maximum score to filter clips by existing score in the database; if None,

  • score_bins (does not restrict by max score) – if provided, a list of tuples (low, high) score

  • in (ranges to count detections) –

    • if None, reports all scores above min_score and below max_score in a single bin

    • if provided, min_score and max_score are ignored and bins are determined by score_bins

  • batch_size – n samples simultaneously processed when applying classifier to embeddings; default 1024

  • date_range – tuple of (start_date, end_date) to filter clips by date; Formats: datetime.datetime, datetime.date, or string in “YYYY-MM-DD” format; if None, does not filter by date Can pass (date,None) or (None,date) to filter by only start or end date, respectively

  • time_range – tuple of (start_time, end_time) to filter clips by time of day; if None, does

  • day (not filter by time of) – Formats: datetime.datetime, datetime.time or string in “HH:MM:SS” format Note: filters by time of day of the _recording_ start time (rather than audio clip start time) Assumes time zone match between time_range values and recording timestamps in the database

  • deployments – list of deployment names to filter by; if None, does not filter by deployment

  • projects – list of project names to filter by; if None, does not filter by project

  • recordings – list of recording names to filter by; if None, does not filter by recording

  • deployments_filter – custom filter dict for deployments; if provided, overrides deployments argument

  • recordings_filter – custom filter dict for recordings; if provided, overrides recordings argument

  • windows_filter – custom filter dict for windows; if provided, overrides date_range,

  • arguments (time_range)

  • annotations_filter – custom filter dict for annotations in hoplite DB

Returns:

dict of dicts with counts[class][bin_range] = count of clips for class in score bin; if score_bins is None, bin_range is (min_score, max_score) (if min_score and/or max_score are also None, uses -inf &/or +inf)

Return type:

counts

opensoundscape.ml.shallow_classifier.fit(model, train_features, train_labels, validation_features=None, validation_labels=None, batch_size=128, steps=1000, optimizer=None, criterion=None, device=torch.device, validation_interval=1, logging_interval=100, early_stopping_patience=None)[source]

train a PyTorch model on features and labels with batching and early stopping

Assumes all data can fit in memory. Training uses batched DataLoaders for efficient processing. If validation data is provided, the model with the lowest validation loss is automatically restored at the end of training (early stopping).

Defaults are for multi-target label problems and assume train_labels is an array of 0/1 of shape (n_samples, n_classes)

Parameters:
  • model (); generally shape (n_samples,n_features) – a torch.nn.Module object to train

  • train_features – input features for training, often embeddings; should be a valid input to

  • model

  • train_labels – labels for training, generally one-hot encoded with shape

  • (n_samples

  • criterion() (n_classes); should be a valid target for)

  • validation_features – input features for validation; if None, does not perform validation

  • validation_labels – labels for validation; if None, does not perform validation

  • batch_size – batch size for training; if fewer samples than batch_size, the entire dataset is used as a single batch [Default: 128]

  • steps – number of training steps forward/backward passes on one batch [Default: 1000]

  • optimizer – torch.optim optimizer to use; default None uses AdamW

  • criterion – loss function to use; default None uses BCELossWeakNegatives() (appropriate for

  • negatives (multi-label classification); this loss function treats NaN labels as weak)

:param : :param using a default weight of 0.01 for NaN labels compared to strong labels: :param device: torch.device to use; default is torch.device(‘cpu’); can also be e.g. :param torch.device(‘cuda: 0’) for first CUDA GPU or torch.device(‘mps’) for Mac with M1/M2 :param validation_interval: how often to validate the model during training; if validation_features :param and validation_labels are provided: :param validation is performed every validation_interval steps: :param logging_interval: how often to print training progress; progress is logged every :param logging_interval steps when validation is performed: :param early_stopping_patience: if provided and validation data is available, training will stop :param early if validation loss doesn’t improve for this many steps: :type early if validation loss doesn’t improve for this many steps: not validation evaluations :param [Default: None, which means no early stopping]

opensoundscape.ml.shallow_classifier.fit_classifier_on_embeddings(embedding_model, classifier_model, train_df, validation_df, n_augmentation_variants=0, embedding_batch_size=1, embedding_num_workers=0, steps=1000, optimizer=None, criterion=None, device=torch.device, early_stopping_patience=None, logging_interval=100, validation_interval=1, audio_root=None)[source]

Embed samples with an embedding model, then fit a classifier on the embeddings

wraps embedding_model.embed() with fit(clf,…)

Also supports generating augmented variations of the training samples

Note: if embedding takes a while and you might want to fit multiple times, consider embedding the samples first then running fit(…) rather than calling this function.

Parameters:
  • embedding_model – a model with an embed() method that takes a dataframe and returns embeddings

  • Perch ((e.g. a pretrained opensoundscape model or Bioacoustics Model Zoo model like)

  • BirdNET

  • HawkEars)

  • classifier_model – a torch.nn.Module object to train, e.g. MLPClassifier or final layer of CNN

  • train_df – dataframe with training samples and labels; see opensoundscape.ml.cnn.train() train_df argument

  • validation_df – dataframe with validation samples and labels; see opensoundscape.ml.cnn.train() validation_df if None, skips validation

  • n_augmentation_variants – if 0 (default), embeds training samples without augmentation; if >0, embeds each training sample with stochastic augmentation num_augmentation_variants times

  • embedding_batch_size – batch size for embedding; default 1

  • embedding_num_workers – number of workers for embedding; default 0

  • steps – model fitting parameters, see fit()

  • optimizer – model fitting parameters, see fit()

  • criterion – model fitting parameters, see fit()

  • device – model fitting parameters, see fit()

  • early_stopping_patience – if provided, training will stop early if validation loss doesn’t improve for this many steps (not validation evaluations) [Default: None, which means no early stopping]

  • logging_interval – how often to print training progress; progress is logged every logging_interval steps when validation is performed

  • validation_interval – how often to validate the model during training; if validation_df is provided, validation is performed every validation_interval steps

  • audio_root – if provided, used as prefix for audio files in train_df and validation_df; if None, assumes train_df and validation_df already have absolute audio paths

Returns:

the embedded training and validation samples and their labels, as torch.tensor, plus a dictionary of validation metrics for the best model found during training

Return type:

x_train, y_train, x_val, y_val, metrics

opensoundscape.ml.shallow_classifier.fit_on_hoplite(classifier, hoplite_db, train_df, validation_df=None, batch_size=128, steps=10000, optimizer=None, criterion=None, device=torch.device, validation_interval=100, logging_interval=100, early_stopping_patience=None, progress_bar=False, **kwargs)[source]

train a PyTorch classifier on Hoplite Embedding DB and label dataframe

Defaults are for multi-target label problems and assume train_df is a dataframe of 0/1 per class with multi-index (file, start_time, end_time)

Parameters:
  • classifier – a torch.nn.Module object to train

  • hoplite_db – a HopliteDB instance containing the embeddings to train on

  • train_df – labels for training, generally one-hot encoded with shape

  • (n_samples

  • criterion() (n_classes); should be a valid target for)

  • validation_df – labels for validation; if None, does not perform validation

  • validation_labels – labels for validation; if None, does not perform validation

  • batch_size – batch size for training; if fewer samples than batch_size, the entire dataset is used as a single batch [Default: 128]

  • steps – number of training steps (epochs; each step, all data is passed forward and backward, and the optimizer updates the weights [Default: 10_000]

  • optimizer – torch.optim optimizer to use; default None uses AdamW

  • criterion – loss function to use; default None uses BCELossWeakNegatives() (appropriate for

  • negatives (multi-label classification); this loss function treats NaN labels as weak) – using a default weight of 0.01 for NaN labels compared to strong labels

:param : using a default weight of 0.01 for NaN labels compared to strong labels :param device: torch.device to use; default is torch.device(‘cpu’) :param validation_interval: how often to validate the model during training; if validation_features :param and validation_labels are provided: :param validation is performed every validation_interval steps: :param logging_interval: how often to print training progress; progress is logged every :param logging_interval steps when validation is performed: :param early_stopping_patience: if provided and validation data is available, training will stop :param early if validation loss doesn’t improve for this many steps: :type early if validation loss doesn’t improve for this many steps: not validation evaluations :param [Default: None, which means no early stopping] :param progress_bar: whether to show a progress bar during training; default False :param **kwargs: additional keyword arguments passed to HopliteDataset; see HopliteDataset.__init__()

opensoundscape.ml.shallow_classifier.get_embeddings_from_hoplite(db, samples, **kwargs)[source]
opensoundscape.ml.shallow_classifier.predict_on_hoplite(db, samples, classifier, clip_duration=None, batch_size=1024, return_df=True, device=torch.device, **kwargs)[source]

Apply model to embeddings from database for each clip in samples

Parameters:
  • db – hoplite database containing embeddings

  • samples – a dataframe of clips or list of audio files dataframe with columns “file”, “start_time”, “end_time” specifying clips to apply the model to

  • classifier – MLPClassifier object or other classifier object to call on the torch.tensor embeddings

  • clip_duration – provide clip length (s) if passing files rather than pre-defined file/start_time/end_time clips

  • batch_size – n samples simultaneously processed when applying classifier to embeddings; default 1024

  • return_df – if True, returns a dataframe with the same index as samples and columns for each class; if False, returns a numpy array of predictions uses classifier.classes if available for df column names, otherwise uses integer column names

  • **kwargs – additional keyword arguments to pass to HopliteDataset

Returns:

predictions for each clip

Return type:

pandas.DataFrame or numpy.ndarray

See also

select_from_hoplite if samples are already embedded and you wish to select filtered (random/top-scoring/all) clips

opensoundscape.ml.shallow_classifier.select_from_hoplite(db, classifier, classes, k=5, strategy: Literal['top_k', 'random_k', 'all'] = 'top_k', batch_size=1024, date_range=None, time_range=None, min_score=None, max_score=None, deployments=None, projects=None, recordings=None, deployments_filter=None, recordings_filter=None, windows_filter=None, annotations_filter=None, random_state=None, return_windows=False, progress_bar=False, warn_no_matches=False)[source]

Extract top-scoring or random clips from the database based on classifier predictions and filters

Parameters:
  • db – hoplite database containing embeddings

  • classifier – MLPClassifier object or other classifier object to call on the torch.tensor embeddings

  • classes – list of class names to select clips for; if None, selects clips for every class in classifier

  • k – number of clips to return per class; default 5 (ignored if strategy=”all”)

  • strategy – which clips to select: “top_k” to return the top k clips for each class “random_k” to return k random clips “all” to return all clips (ignores k) default “top_k”

  • batch_size – n samples simultaneously processed when applying classifier to embeddings; default 1024

  • date_range – tuple of (start_date, end_date) to filter clips by date; Formats: datetime.datetime, datetime.date, or string in “YYYY-MM-DD” format; if None, does not filter by date Can pass (date,None) or (None,date) to filter by only start or end date, respectively

  • time_range – tuple of (start_time, end_time) to filter clips by time of day; if None, does not filter by time of day Formats: datetime.datetime, datetime.time or string in “HH:MM:SS” format Note: filters by time of day of the _recording_ start time (rather than audio clip start time) Assumes time zone match between time_range values and recording timestamps in the database

  • min_score – minimum score to filter clips by existing score in the database; if None, does not threshold by min score

  • max_score – maximum score to filter clips by existing score in the database; if None, does not restrict by max score

  • deployments – list of deployment names to filter by; if None, does not filter by deployment

  • projects – list of project names to filter by; if None, does not filter by project

  • recordings – list of recording names to filter by; if None, does not filter by recording

  • deployments_filter – custom filter dict for deployments; if provided, overrides deployments argument

  • recordings_filter – custom filter dict for recordings; if provided, overrides recordings argument

  • windows_filter – custom filter dict for windows; if provided, overrides date_range, time_range arguments

  • annotations_filter – custom filter dict for annotations in hoplite DB

  • warn_no_matches – if True, raises a warning if no clips are found for a class after applying filters and score thresholds; default False

Returns:

list of matching windows} if return_windows=True; otherwise a dataframe with columns for class, score, and window info

Return type:

dict of {class_name

opensoundscape.ml.utils module

Utilties for .ml

opensoundscape.ml.utils.apply_activation_layer(x, activation_layer=None)[source]

applies an activation layer to a set of scores

Parameters:
  • x – input values

  • activation_layer

    • None [default]: return original values

    • ’softmax’: apply softmax activation

    • ’sigmoid’: apply sigmoid activation

    • ’softmax_and_logit’: apply softmax then logit transform

Returns:

values with activation layer applied Note: if x is None, returns None

Note: casts x to float before applying softmax, since torch’s softmax implementation doesn’t support int or Long type

opensoundscape.ml.utils.cas_dataloader(dataset, batch_size, num_workers)[source]

Return a dataloader that uses the class aware sampler

Class aware sampler tries to balance the examples per class in each batch. It selects just a few classes to be present in each batch, then samples those classes for even representation in the batch.

Parameters:
  • dataset – a pytorch dataset type object

  • batch_size – see DataLoader

  • num_workers – see DataLoader

opensoundscape.ml.utils.check_labels(label_df, classes)[source]

check that classes and label_df.columns are the same, otherwise raise a helpful error

opensoundscape.ml.utils.collate_audio_samples_to_tensors(batch)[source]

takes a list of AudioSample objects, returns batched tensors

use this collate function with DataLoader if you want to use AudioFileDataset (or AudioFileDataset) but want the traditional output of PyTorch Dataloaders (returns two tensors:

the first is a tensor of the data with dim 0 as batch dimension, the second is a tensor of the labels with dim 0 as batch dimension)

Parameters:

batch – a list of AudioSample objects

Returns:

(Tensor of stacked AudioSample.data, Tensor of stacked AudioSample.label.values)

Example usage: ```

from opensoundscape import AudioFileDataset, SpectrogramPreprocessor

preprocessor = SpectrogramPreprocessor(sample_duration=2,height=224,width=224) audio_dataset = AudioFileDataset(label_df,preprocessor)

train_dataloader = DataLoader(

audio_dataset, batch_size=64, shuffle=True, collate_fn = collate_audio_samples_to_tensors

)

```

opensoundscape.ml.utils.get_batch(array, batch_size, batch_number)[source]

get a single slice of a larger array

using the batch size and batch index, from zero

Parameters:
  • array – iterable to split into batches

  • batch_size – num elements per batch

  • batch_number – index of batch

Returns:

one batch (subset of array)

Note: the final elements are returned as the last batch even if there are fewer than batch_size

Example

if array=[1,2,3,4,5,6,7] then:

  • get_batch(array,3,0) returns [1,2,3]

  • get_batch(array,3,3) returns [7]

opensoundscape.ml.export module

class opensoundscape.ml.export.SequentialModelExporter(*args: Any, **kwargs: Any)[source]

Bases: Module

forward(x)[source]
opensoundscape.ml.export.to_onnx_program(preprocessing_transforms, torch_model, input_length, activation_layer=None, include_preprocessor_output=True, include_embedding_output=True, include_classifier_output=True, opset_version=18, **kwargs)[source]

Export a torch model with preprocessing transforms to ONNX format

Attempts to separate embedding and classifier outputs from torch_model, if torch_model has attribute ‘classifier_layer’ indicating the name of the layer that should be considered the “classifier”. The remaining layers are considered the “embedding” portion of the network. There should be no layers after the classifier layer.

Optionally adds a sigmoid or softmax activation layer on the classifier outputs.

Requires that onnx, onnxruntime, and onnxscript are packages are installed

Parameters:
  • preprocessing_transforms – torch.nn.Module, preprocessing transforms to apply to raw audio

  • torch_model – torch.nn.Module, model to export

  • input_length – int, length of input audio samples in number of samples

  • activation_layer – str or None, activation layer to apply to classifier outputs options: None, ‘softmax’, ‘sigmoid’

  • include_preprocessor_output – bool, whether to include preprocessor output in ONNX model outputs

  • include_embedding_output – bool, whether to include embedding output in ONNX model outputs

  • include_classifier_output – bool, whether to include classifier output in ONNX model outputs

  • opset_version – int, ONNX opset version to use for export currently defaults to 18 because of issues with dynamic shapes in 20 with pytorch 2.9.0; should upgrade to 20 when stable fixes are released

  • **kwargs – additional keyword arguments to pass to torch.onnx.export

Returns:

ONNX program model object

Return type:

onnx_model

Example: ```python from opensoundscape import Audio, Spectrogram, CNN, BoxedAnnotations, preprocessors

model = CNN(

architecture=”efficientnet_b0”, classes=[0, 1, 2, 3], sample_duration=3, preprocessor_cls=preprocessors.TorchSpectrogramPreprocessor, sample_rate=32000,

) # a list of torchaudio preprocesesing transforms such as Spectrogram, MelSpectrogram, etc. transforms=model.preprocessor[“transform”].transforms

# expected number of samples in input audio: 3*32000 input_length = model.preprocessor.sample_rate * model.preprocessor.sample_duration

onnx_program = to_onnx_program(

preprocessing_transforms=transforms, torch_model=model.network, input_length=input_length, include_preprocessor_output=True,

) onnx_program.save(“efficientnet_b0.onnx”) ```

opensoundscape.ml.song_space module

class opensoundscape.ml.song_space.SongSpace(path, feature_extractor='perch2', sample_duration=None)[source]

Bases: object

SongSpace is a framework for training and applying classifiers, combining a feature extractor and database

A SongSpace couples a feature extractor (e.g., BirdNET or Perch) with a database that stores embeddings of audio clips We can add one or more shallow classifiers, and labeled training and evaluation datasets

It provides utilities for: - ingesting audio datasets by saving their deep learning embeddings in a database - creating and evaluating (shallow) classifiers - applying a classifier to embeddings in a hoplite database with filtering by metadata and scores - selecting top-scoring or random clips from the database based on classifier predictions and filters - embedding-based similarity search

The main purpose of this class is to enable users to easily complete an active learning loop: - start with a few labeled samples and a bunch of unlabeled audio - embed everything - use similarity search, shallow classifiers, or targeted/random search to find clips - review clips and label more data - apply the final classifier to select clips for manual verification - end with manually verified detections for downstream analysis - potentially repeat with other species/classes

Parameters:
  • path (str) – The path to the SongSpace directory

  • feature_extractor (str or model) – The feature extractor to use for embedding audio clips. Can be a string key for a model in the bioacoustics model zoo (“bs-convnext”, “birdnet”, “perch”, “perch2”) or a custom model object with an embed() method and a classifier attribute with an in_features property indicating the embedding dimension.

  • sample_duration (float) – duration of audio clips to embed and classify, in seconds; if None, uses the default sample duration of the feature extractor

add_classifier(name, model)[source]

Add a classifier to the SongSpace with a given name

property database

The database object used to store embeddings for this SongSpace

property to protect from accidental modification

property db

alias for self.database

evaluate(classifier_name, dataset_name, batch_size=1024)[source]

Evaluate a classifier on a specified dataset and return metrics

fit_classifier(classes, train_datasets, validation_dataset, weak_negatives_proportion=2, batch_size=128, steps=1000, optimizer=None, criterion=None, device='cpu', early_stopping_patience=None, logging_interval=100, validation_interval=1, classifier_hidden_layers=(), weak_negatives_weight=0.01)[source]

Fit a classifier on embeddings from the database for a given dataset

Note: Before fitting a classifier, ingest and create audio datasets with ingest_audio()

Parameters:
  • classes – list of class names to train the classifier for; if None, trains for every class in the dataset(s)

  • train_datasets – list of dataset names to use for training; must have been added with ingest_audio()

  • validation_dataset – dataset name to use for validation if None, skips validation

  • weak_negatives_proportion – ratio of weak negatives to positives to add to the training data selects random unlabeled samples from the database and treats as no-species samples, but with a small weight in the loss function default 2 means adding 2 weak negatives for every labeled sample; if 0, does not add any weak negatives ignored if criterion is passed

  • embedding_batch_size – batch size for embedding; default 1

  • embedding_num_workers – number of workers for embedding; default 0

  • batch_size – model fitting parameters, see fit()

  • steps – model fitting parameters, see fit()

  • optimizer – model fitting parameters, see fit()

  • criterion – model fitting parameters, see fit()

  • device – model fitting parameters, see fit()

  • early_stopping_patience – if provided, training will stop early if validation loss doesn’t improve for this many steps (not validation evaluations) [Default: None, which means no early stopping]

  • logging_interval – how often to print training progress; progress is logged every logging_interval steps when validation is performed

  • validation_interval – how often to validate the model during training; if validation_dataset_name is provided, validation is performed every validation_interval steps

  • audio_root – if provided, used as prefix for audio files in train_df and validation_df; if None, assumes train_df and validation_df already have absolute audio paths

  • classifier_hidden_layers – tuple of hidden layer sizes for the MLPClassifier; default is () for no hidden layers (i.e. linear probe / logistic regression)

  • weak_negatives_weight – weight for the weak negative samples in the loss function default 0.01; ignored if criterion is passed

Returns: new classifier

get_dataset(name)[source]

return labels_df for dataset name

get_dataset_embeddings(dataset_name)[source]

Utility to get the embeddings and labels for a given dataset as numpy arrays

ingest_audio(samples, dataset_name, file_to_deployment=<function parent_folder_name>, allow_training=True, audio_root=None, embedding_exists_mode='skip', file_to_datetime=aru_metadata_parser.parse.ARUFileTimestampParser.parse, **kwargs)[source]

Embed samples using the feature extractor and store in a new or existing dataset

Parameters:
  • samples – dataframe with columns “file”, “start_time”, “end_time” specifying clips to embed

  • dataset_name

    name of the dataset to store the embeddings in - if existing, combines with existing dataset of the same name, taking the new

    labels in the case of conflicts

    • if not existing, creates a new dataset with the given name, using allow_training and audio_root to set up the dataset parameters

    Also uses dataset_name as the ‘project’ name for the deployment in the database

  • file_to_deployment – str, function, or dictionary mapping filenames to deployment names - if function, should take a single argument (filename: str) and return a deployment name (str) - if dictionary or pd.Series, should map filenames (str) to deployment names (str) - if str, the name of the deployment that all samples will be associated with - if deployment does not exist in db, it will be created Utility functions for common patterns are provided in opensoundscape.utils, including parent_folder_name, two_parents_name, second_parent_name, filename_first_part (an LLM would also be great at writing a custom function given your deployment:audio file structure)

  • allow_training – if True, allows using this dataset for training classifiers; if False, dataset can still be used for validation but not training; default True

  • audio_root – if provided, used as prefix for audio files in samples; if None, assumes samples already have absolute audio paths if full paths provided and audio_root provided, converts to relative paths by stripping audio_root from the start of the paths in samples before embedding and storing in the database (see also: update_dataset_audio_root() to update audio_root if you move the entire audio dataset)

  • embedding_exists_mode – ‘skip’, ‘error’, or ‘add’ [default: ‘skip’] how to handle cases where an embedding already exists in the database # TODO impement ‘replace’ skip: skip embedding and keep existing embedding error: raise an error if an embedding already exists for a clip in samples add: add a new embedding alongside the existing one (e.g. for augmentated variations of same clip)

  • file_to_datetime – optional function or dictionary mapping filenames to datetime objects - used to set recording start times in the database Default: uses a flexible parser from aru_metadata_parser.parse handling most formats

  • **kwargs – additional keyword arguments to pass to the feature extractor’s embed() method

list_classifiers()[source]

List the names of the classifiers currently in the SongSpace

list_datasets()[source]

List the names of the datasets currently in the SongSpace

metrics(predictions, labels, classes)[source]

Compute evaluation metrics for a set of predictions and true labels

classmethod open(path, feature_extractor=None)[source]

Open an existing SongSpace from a specified path

if the feature_extractor is not one of the registered bioacoustics model zoo options (“bs-convnext”, “birdnet”, “perch”, “perch2”), create the feature extractor used previously, then pass it to this method.

predict_on_dataset(classifier_name, dataset_name, batch_size=1024, return_df=True)[source]

Apply a classifier to a dataset and return predictions as a dataframe with the same index as the dataset’s label_df and columns for each class

remove_classifier(name)[source]

Remove a classifier from the SongSpace by name

remove_dataset(name)[source]

Remove a dataset from the SongSpace by name

save()[source]

Save the SongSpace metadata to the SongSpace path, so that it can be re-loaded later with SongSpace.open()

select(classifier, classes, k=5, strategy: Literal['top_k', 'random_k', 'all'] = 'top_k', batch_size=1024, date_range=None, time_range=None, min_score=None, max_score=None, deployments=None, projects=None, recordings=None, deployments_filter=None, recordings_filter=None, windows_filter=None, annotations_filter=None, random_state=None, return_windows=False, progress_bar=False, warn_no_matches=False)[source]

Extract top-scoring or random clips from the database based on classifier predictions and filters

Parameters:
  • db – hoplite database containing embeddings

  • classifier – classifier to apply to embeddings in the database to generate clip ranking scores MLPClassifier object or other classifier object to call on the torch.tensor embeddings, or the name (str) in self.classifiers dictionary must have a ‘classes’ attribute listing the class names, including the classes specified in the classes argument

  • classes – list of class names to select clips for; if None, selects clips for every class in classifier

  • k – number of clips to return per class; default 5 (ignored if strategy=”all”)

  • strategy – which clips to select: “top_k” to return the top k clips for each class “random_k” to return k random clips “all” to return all clips (ignores k) default “top_k”

  • batch_size – n samples simultaneously processed when applying classifier to embeddings; default 1024

  • date_range – tuple of (start_date, end_date) to filter clips by date; Formats: datetime.datetime, datetime.date, or string in “YYYY-MM-DD” format; if None, does not filter by date Can pass (date,None) or (None,date) to filter by only start or end date, respectively

  • time_range – tuple of (start_time, end_time) to filter clips by time of day; if None, does not filter by time of day Formats: datetime.datetime, datetime.time or string in “HH:MM:SS” format Note: filters by time of day of the _recording_ start time (rather than audio clip start time) Assumes time zone match between time_range values and recording timestamps in the database

  • min_score – minimum score to filter clips by existing score in the database; if None, does not threshold by min score

  • max_score – maximum score to filter clips by existing score in the database; if None, does not restrict by max score

  • deployments – list of deployment names to filter by; if None, does not filter by deployment

  • projects – list of project names to filter by; if None, does not filter by project

  • recordings – list of recording names to filter by; if None, does not filter by recording

  • deployments_filter – custom filter dict for deployments; if provided, overrides deployments argument

  • recordings_filter – custom filter dict for recordings; if provided, overrides recordings argument

  • windows_filter – custom filter dict for windows; if provided, overrides date_range, time_range arguments

  • annotations_filter – custom filter dict for annotations in hoplite DB

  • warn_no_matches – if True, raises a warning if no clips are found for a class after applying filters and score thresholds; default False

Returns:

list of matching windows} if return_windows=True; otherwise a dataframe with columns for class, score, and window info

Return type:

dict of {class_name

Find the k most similar embeddings in the database to each query audio sample

Parameters:
  • query_samples – audio file path, list of files, or dataframe with columns “file”, “start_time”, “end_time” specifying clips to embed and search for

  • k – number of similar samples to return; default 5

  • exact_search – default (False) uses an approximate nearest neighbor search for speed; if True, uses exact search for maximum recall but slower speed

  • search_subset_size – if provided, limits the search to a random subset of all samples

  • target_score – if provided, returns samples close to the target similarity score rather than _most_ similar samples - useful for finding samples that are similar but not too similar to the query samples

  • audio_root – if provided, used as prefix for audio files in query_samples; if None, assumes query_samples already have absolute audio paths

  • search_kwargs – dict of additional keyword arguments passed to db.ui.search() or brutalism.threaded_brute_search() if exact_search=True exact_search=False: radius, threads, exact, log, progress exact_search=True: batch_size, max_workers, rng_seed

  • **embedding_kwargs – additional keyword arguments passed to self.embed(), such as batch_size and num_workers

Returns:

A dataframe with the same columns as the database metadata and an additional ‘similarity’ column, sorted by similarity to the query embedding

stratified_selection(classifier, classes, stratify_deployments=True, stratify_day=False, date_ranges=None, stratify_recordings=False, stratify_datasets=False, k=5, strategy: Literal['top_k', 'random_k', 'all'] = 'top_k', batch_size=1024, date_range=None, time_range=None, min_score=None, max_score=None, deployments=None, projects=None, recordings=None, deployments_filter=None, recordings_filter=None, windows_filter=None, annotations_filter=None, random_state=None, progress_bar=False, warn_no_matches=False)[source]

Perform stratified selection of clips based on classifier predictions and filters

Parameters:
  • classifier – classifier to apply to embeddings in the database to generate clip ranking scores; see select() for details

  • classes – list of class names to select clips for; if None, selects clips for every class in classifier

  • stratify_deployments – whether to stratify selection by deployment; default True

  • stratify_day – whether to stratify selection by day; default False

  • date_ranges – optional list of inclusive (start_date, end_date) ‘YYYY-MM-DD’ tuples for stratification - if provided, stratify_day is ignored and stratification is based on these date ranges instead

  • stratify_recordings – whether to stratify selection by recording (audio file); default False

  • stratify_datasets – whether to stratify selection by dataset; default False

  • k – number of clips to return per class per stratum; default 5 (ignored if strategy=”all”)

  • strategy – which clips to select: “top_k” to return the top k clips for each class in each stratum “random_k” to return k random clips for each class in each stratum “all” to return all clips (ignores k) default “top_k”

  • batch_size – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments

  • date_range – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments

  • time_range – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments

  • min_score – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments

  • max_score – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments

  • deployments – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments

  • projects – recordings, deployments_filter, recordings_filter, windows_filter, annotations_filter, random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments

:paramrecordings, deployments_filter, recordings_filter, windows_filter, annotations_filter,

random_state, return_windows, progress_bar, warn_no_matches: see select() for details on these arguments

Returns:

{class_name: list of matching windows}} if return_windows=True; otherwise a dataframe with columns for stratum_value, class, score, and window info

Return type:

dict of {stratum_value

update_dataset_audio_root(name, new_audio_root)[source]

Update the audio_root for a given dataset, which is used as the prefix for audio file paths when embedding new samples and searching for existing embeddings in the database

This is useful if you need to move your audio files after ingesting a dataset, or if you originally ingested with incorrect audio paths.

Note that this does not change the file paths in the label_df, but rather updates the audio_root that is prefixed to those file paths when embedding new samples or searching for existing embeddings in the database.

Module contents