Subpackages
- opensoundscape.localization package
- Submodules
- opensoundscape.localization.audiomoth_sync module
- opensoundscape.localization.localization_algorithms module
- opensoundscape.localization.position_estimate module
- opensoundscape.localization.spatial_event module
- opensoundscape.localization.synchronized_recorder_array module
- opensoundscape.localization.utils module
- Module contents
- opensoundscape.ml package
- Submodules
- opensoundscape.ml.cam module
- opensoundscape.ml.cnn module
CNNChannelDimCheckErrorSpectrogramClassifierSpectrogramClassifier.batch_forward()SpectrogramClassifier.current_stepSpectrogramClassifier.deviceSpectrogramClassifier.early_stopping_configSpectrogramClassifier.embed()SpectrogramClassifier.embed_to_hoplite_db()SpectrogramClassifier.eval()SpectrogramClassifier.generate_cams()SpectrogramClassifier.generate_samples()SpectrogramClassifier.load()SpectrogramClassifier.load_weights()SpectrogramClassifier.log_fileSpectrogramClassifier.logging_levelSpectrogramClassifier.loss_histSpectrogramClassifier.nameSpectrogramClassifier.per_class_metrics()SpectrogramClassifier.predict()SpectrogramClassifier.profile()SpectrogramClassifier.run_evaluation()SpectrogramClassifier.sample_durationSpectrogramClassifier.sample_rateSpectrogramClassifier.save()SpectrogramClassifier.save_onnx()SpectrogramClassifier.save_weights()SpectrogramClassifier.similarity_search_hoplite_db()SpectrogramClassifier.train()SpectrogramClassifier.verbose
SpectrogramModuleSpectrogramModule.change_classes()SpectrogramModule.change_classifier()SpectrogramModule.classifierSpectrogramModule.compute_per_class_metricsSpectrogramModule.freeze_feature_extractor()SpectrogramModule.freeze_layers_except()SpectrogramModule.lr_scheduler_stepSpectrogramModule.networkSpectrogramModule.single_targetSpectrogramModule.unfreeze()
get_channel_dim()list_model_classes()load_model()register_model_cls()use_resample_loss()
- opensoundscape.ml.cnn_architectures module
alexnet()change_conv2d_channels()change_fc_output_size()densenet121()efficientnet_b0()efficientnet_b1()efficientnet_b4()freeze_params()generic_make_arch()list_architectures()register_arch()resnet101()resnet152()resnet18()resnet34()resnet50()set_layer_from_name()squeezenet1_0()unfreeze_params()vgg11_bn()
- opensoundscape.ml.dataloaders module
- opensoundscape.ml.datasets module
AudioFileDatasetAudioFileDataset.audio_rootAudioFileDataset.bypass_augmentationsAudioFileDataset.class_counts()AudioFileDataset.classesAudioFileDataset.from_categorical_df()AudioFileDataset.head()AudioFileDataset.invalid_samplesAudioFileDataset.label_dfAudioFileDataset.preprocessorAudioFileDataset.sample()
EmbeddingDatasetHopliteDatasetInvalidIndexErrorNoMatchingWindowIDsError
- opensoundscape.ml.lightning module
- opensoundscape.ml.loss module
- opensoundscape.ml.safe_dataset module
- opensoundscape.ml.sampling module
- opensoundscape.ml.shallow_classifier module
- opensoundscape.ml.utils module
- opensoundscape.ml.export module
- opensoundscape.ml.song_space module
SongSpaceSongSpace.add_classifier()SongSpace.databaseSongSpace.dbSongSpace.evaluate()SongSpace.fit_classifier()SongSpace.get_dataset()SongSpace.get_dataset_embeddings()SongSpace.ingest_audio()SongSpace.list_classifiers()SongSpace.list_datasets()SongSpace.metrics()SongSpace.open()SongSpace.predict_on_dataset()SongSpace.remove_classifier()SongSpace.remove_dataset()SongSpace.save()SongSpace.select()SongSpace.similarity_search()SongSpace.stratified_selection()SongSpace.update_dataset_audio_root()
- Module contents
- opensoundscape.preprocess package
- Submodules
- opensoundscape.preprocess.action_functions module
adaptive_random_gain()adaptive_random_noise()audio_add_noise()audio_random_gain()audio_time_mask()frequency_mask()image_to_tensor()list_action_fns()pcen()random_lowpass()random_wrap_audio()register_action_fn()scale_tensor()tensor_add_noise()time_mask()torch_color_jitter()torch_random_affine()
- opensoundscape.preprocess.actions module
- opensoundscape.preprocess.img_augment module
- opensoundscape.preprocess.io module
- opensoundscape.preprocess.overlay module
- opensoundscape.preprocess.preprocessors module
- opensoundscape.preprocess.tensor_augment module
- opensoundscape.preprocess.utils module
- Module contents
Submodules
opensoundscape.annotations module
functions and classes for manipulating annotations of audio
includes BoxedAnnotations class and utilities to combine or “diff” annotations, etc.
- class opensoundscape.annotations.BoxedAnnotations(df=None, annotation_files=None, audio_files=None)[source]
Bases:
objectcontainer for “boxed” (frequency-time) annotations of audio (for instance, annotations created in Raven software)
includes functionality to load annotations from Pandas DataFrame or Raven Selection tables (.txt files), output one-hot labels for specific clip lengths or clip start/end times, apply corrections/conversions to annotations, and more.
Contains some analogous functions to Audio and Spectrogram, such as trim() [limit time range] and bandpass() [limit frequency range]
the .df attribute is a Pandas DataFrame containing the annotations with time and frequency bounds
the .annotation_files and .audio_files attributes are lists of annotation and audio file paths, respectively. They are retained as a record of _what audio was annotated_, rather than what annotations were placed on the audio. For instance, an audio file may have no entries in the dataframe if it contains no annotations, but is listed in audio_files because it was annotated/reviewed.
- annotation_files
- audio_files
- bandpass(low_f, high_f, edge_mode='trim')[source]
Bandpass a set of annotations, analogous to Spectrogram.bandpass()
Reduces the range of annotation boxes overlapping with the bandpass limits, and removes annotation boxes entirely if they lie completely outside of the bandpass limits.
Out-of-place operation: does not modify itself, returns new object
- Parameters:
low_f – low frequency (Hz) bound
high_f – high frequench (Hz) bound
edge_mode – what to do when boxes overlap with edges of trim region - ‘trim’: trim boxes to bounds - ‘keep’: allow boxes to extend beyond bounds - ‘remove’: completely remove boxes that extend beyond bounds
- Returns:
a copy of the BoxedAnnotations object on the bandpassed region
- clip_labels(clip_duration, min_label_overlap, min_label_fraction=None, full_duration=None, class_subset=None, audio_files=None, return_type='multihot', keep_duplicates=False, **kwargs)[source]
Generate one-hot labels for clips of fixed duration
wraps utils.make_clip_df() with self.labels_on_index() - Clips are created in the same way as Audio.split() - Labels are applied based on overlap, using self.labels_on_index()
- Parameters:
clip_duration (float) – The duration in seconds of the clips
min_label_overlap – minimum duration (seconds) of annotation within the time interval for it to count as a label. Note that any annotation of length less than this value will be discarded. We recommend a value of 0.25 for typical bird songs, or shorter values for very short-duration events such as chip calls or nocturnal flight calls.
min_label_fraction – [default: None] if >= this fraction of an annotation overlaps with the time window, it counts as a label regardless of its duration. Note that if either of the two criterea (overlap and fraction) is met, the label is 1. if None (default), this criterion is not used (i.e., only min_label_overlap is used). A value of 0.5 for ths parameter would ensure that all annotations result in at least one clip being labeled 1 (if there are no gaps between clips).
full_duration – The amount of time (seconds) to split into clips for each file any of float, list, or None - if None, attempts to get each file’s duration using librosa.get_duration(path=file) where file is the value of audio for each row of self.df - if float: uses this fixed duration for all files - if list: should be same length as audio_files, giving duration for each file
class_subset – list of classes for one-hot labels. If None, classes will be all unique values of self.df[‘annotation’]
audio_files – list of audio file paths (as str or pathlib.Path) to create clips for. If None, uses self.audio_files. [default: None]
return_type –
(‘multihot’,’integers’,’classes’, or ‘CategoricalLabels’): ‘multihot’: [default] returns a dataframe with a column for each class
and 0/1 values for class presence.
- ’integers’: returns a dataframe with ‘labels’ column containing lists of
integer class indices for each clip, corresponding to the classes list; also returns a second value, the list of class names
- ’classes’: returns a dataframe with ‘labels’ column containing lists of
class names for each clip
’CategoricalLabels’: returns a CategoricalLabels object
keep_duplicates – [default: False] if True, allows multiple annotations of a class to be retained for a single clip; e.g. labels [‘a’,’a’,’b]. Ignored if return_type is ‘multihot’.
**kwargs (such as clip_step, final_clip) – opensoundscape.utils.generate_clip_times_df() via make_clip_df()
- Returns: depends on return_type argument
- ‘multihot’: [default] returns a dataframe with a column for each class
and 0/1 values for class presence.
- ‘integers’: returns a dataframe with ‘labels’ column containing lists of
integer class indices for each clip, corresponding to the classes list; also returns a second value, the list of class names
- ‘classes’: returns a dataframe with ‘labels’ column containing lists of
class names for each clip; also returns a second value, the list of class names
‘CategoricalLabels’: returns a CategoricalLabels object
- classmethod concat(list_of_boxed_annotations)[source]
concatenate a list of BoxedAnnotations objects into one
- convert_labels(conversion_table)[source]
modify annotations according to a conversion table
Changes the values of ‘annotation’ column of dataframe. Any labels that do not have specified conversions are left unchanged.
Returns a new BoxedAnnotations object, does not modify itself (out-of-place operation). So use could look like: my_annotations = my_annotations.convert_labels(table)
- Parameters:
conversion_table – current values -> new values. can be either - pd.DataFrame with 2 columns [current value, new values] or - dictionary {current values: new values}
- Returns:
new BoxedAnnotations object with converted annotation labels
- df
- classmethod from_crowsetta(annotations, audio_files=None, annotation_files=None)[source]
create BoxedAnnotations from crowsetta.Annotation object or list of Annotation objects
- Parameters:
annotations –
crowsetta.Annotation object or list of objects the objects _either_ have .bbox: list of BBox objects, OR .seq: Sequence object with list of values for onset/offset
(or sample onset/offset), labels
audio_files – optionally, pass list of the annotated audio files (this might include files with zero annotations)
annotation_files – optionally, pass list of files containing annotations
- Returns:
BoxedAnnotations object containing the annotations in .df, and possibly containing the provided .audio_files and .annotation_files lists
Note: if an empty list is passed, creates empty BoxedAnnotations object
- classmethod from_crowsetta_bbox(bbox, audio_file, annotation_file)[source]
create BoxedAnnotations object from a crowsetta.BBox object
- Parameters:
bbox – a crowsetta.BBox object
audio_file – (str) path of annotated audio file
annotation_file – (str) path of annotation file
- Returns:
BoxedAnnotations object
this classmethod is used by from_crowsetta()
- classmethod from_crowsetta_seq(seq, audio_file, annotation_file)[source]
create BoxedAnnotations from crowsetta.Sequence object
Note: low_f and high_f will be None since Sequence does not contain information about frequency
Note: the .df of the returned BoxedAnnotations retains the Sequence’s .onset_samples and .offset_samples information, but only uses the Sequence’s .onsets_s and .offsets_s (which may sometimes be None) for the start_time and end_time columns in BoxedAnnotations.df.
- Parameters:
seq – a crowsetta.Sequence object
audio_file – (str) path of annotated audio file
annotation_file – (str) path of annotation file
- Returns:
BoxedAnnotations object
this classmethod is used by from_crowsetta()
- classmethod from_csv(path)[source]
load csv from path and creates BoxedAnnotations object
Note: the .annotation_files and .audio_files attributes will be none
- Parameters:
path – file path of csv. see __init__() docstring for required column names
- Returns:
BoxedAnnotations object
- classmethod from_raven_files(raven_files, annotation_column, audio_files=None, keep_extra_columns=True, column_mapping_dict=None, warn_no_annotations=False)[source]
load annotations from Raven .txt files
- Parameters:
raven_files – list or iterable of raven .txt file paths (as str or pathlib.Path), or a single file path (str or pathlib.Path). Eg [‘path1.txt’,’path2.txt’]
annotation_column –
column name(s) or integer position to use as the annotations - pass None to load the Raven file without explicitly assigning a column as the annotation column. The resulting object’s .df will have an annotation column with nan values! - if a string is passed, the column with this name will be used as the annotations. - if an integer is passed, the column at that position will be used as the annotation column.
NOTE: column positions are ordered increasingly starting at 0.
- if a list/tuple is passed, find a column matching any value in the list
NOTE: if multiple columns match, an error will be raised Example: [‘annotation’,’label’,’Species’] will find a column with any of these names
audio_files – (list) optionally specify audio files corresponding to each raven file (length should match raven_files) Eg [‘path1.txt’,’path2.txt’] - if None (default), .clip_labels() will not be able to check the duration of each audio file, and will raise an error unless full_duration is passed as an argument
keep_extra_columns – keep or discard extra Raven file columns (always keeps start_time, end_time, low_f, high_f, annotation audio_file). [default: True] - True: keep all - False: keep none - or iterable of specific columns to keep
column_mapping_dict –
dictionary mapping Raven column names to desired column names in the output dataframe. The columns of the loaded Raven file are renamed according to this dictionary. The resulting dataframe must contain: [‘start_time’,’end_time’,’low_f’,’high_f’] [default: None] If None (or for any unspecified columns), will use the standard column names:
- {
“Begin Time (s)”: “start_time”, “End Time (s)”: “end_time”, “Low Freq (Hz)”: “low_f”, “High Freq (Hz)”: “high_f”,
}
This dictionary will be updated with any user-specified mappings.
warn_no_annotations – [default: False] if True, will issue a warning if a Raven file has zero rows (meaning no annotations present).
- Returns:
BoxedAnnotations object containing annotations from the Raven files (the .df attribute is a dataframe containing each annotation)
- global_multi_hot_labels(classes)[source]
make list of 0/1 for presence/absence of classes across all annotations
- Parameters:
classes – iterable of class names to give 0/1 labels
- Returns:
list of 0/1 labels for each class
- labels_on_index(clip_df, min_label_overlap, min_label_fraction=None, class_subset=None, return_type='multihot', keep_duplicates=False, warn_no_annotations=False)[source]
create a dataframe of clip labels based on given starts/ends.
- Format of label dataframe depends on return_type argument:
- ‘multihot’: [default] returns a dataframe with a column for each class
and 0/1 values for class presence.
- ‘integers’: returns a dataframe with ‘labels’ column containing lists of
integer class indices for each clip, corresponding to the classes list; also returns a second value, the list of class names
- ‘classes’: returns a dataframe with ‘labels’ column containing lists of
class names for each clip
‘CategoricalLabels’: returns a CategoricalLabels object
Uses start and end clip times from clip_df to define a set of clips for each file. Then extracts annotations overlapping with each clip.
Required overlap to consider an annotation to overlap with a clip is defined by user: an annotation must satisfy the minimum time overlap OR minimum % overlap to be included (doesn’t require both conditions to be met, only one)
clip_df can be created using opensoundscap.utils.make_clip_df
See also: .clip_labels(), which creates even-lengthed clips automatically and can often be used instead of this function.
- Parameters:
clip_df – dataframe with (file, start_time, end_time) MultiIndex specifying the temporal bounds of each clip (clip_df can be created using opensoundscap.helpers.make_clip_df)
min_label_overlap – minimum duration (seconds) of annotation within the time interval for it to count as a label. Note that any annotation of length less than this value will be discarded. We recommend a value of 0.25 for typical bird songs, or shorter values for very short-duration events such as chip calls or nocturnal flight calls.
min_label_fraction – [default: None] if >= this fraction of an annotation overlaps with the time window, it counts as a label regardless of its duration. Note that if either of the two criterea (overlap and fraction) is met, the label is 1. if None (default), this criterion is not used (i.e., only min_label_overlap is used). A value of 0.5 for ths parameter would ensure that all annotations result in at least one clip being labeled 1 (if there are no gaps between clips).
class_subset – list of classes for one-hot labels. If None, classes will be all unique values of self.df[‘annotation’]
return_type –
(‘multihot’,’integers’,’classes’, or ‘CategoricalLabels’): ‘multihot’: [default] returns a dataframe with a column for each class
and 0/1 values for class presence.
- ’integers’: returns a dataframe with ‘labels’ column containing lists of
integer class indices for each clip, corresponding to the classes list; also returns a second value, the list of class names
- ’classes’: returns a dataframe with ‘labels’ column containing lists of
class names for each clip
’CategoricalLabels’: returns a CategoricalLabels object
keep_duplicates – [default: False] if True, allows multiple annotations of a class to be retained for a single clip; e.g. labels [‘a’,’a’,’b]. Ignored if return_type is ‘multihot’
warn_no_annotations – bool [default:False] if True, raises warnings for any files in clip_df with no corresponding annotations in self.df
- Returns: depends on return_type argument
- ‘multihot’: [default] returns a dataframe with a column for each class
and 0/1 values for class presence.
- ‘integers’: returns a dataframe with ‘labels’ column containing lists of
integer class indices for each clip, corresponding to the classes list; also returns a second value, the list of class names
- ‘classes’: returns a dataframe with ‘labels’ column containing lists of
class names for each clip; also returns a second value, the list of class names
‘CategoricalLabels’: returns a CategoricalLabels object
- subset(classes)[source]
subset annotations to those from a list of classes
out-of-place operation (returns new filtered BoxedAnnotations object)
- Parameters:
classes – list of classes to retain (all others are discarded)
them (- the list can include nan or None if you want to keep)
- Returns:
new BoxedAnnotations object containing only annotations in classes
- to_crowsetta(mode='bbox', ignore_annotation_id=False, ignore_sequence_id=False)[source]
create crowsetta.Annotations objects
Creates (at least) one crowsetta.Annotation object per unique combination of audio_file,`annotation_file` in self.df - if annotation_id column is present, creates one Annotation object per unique value of annotation_id per unique combination of audio_file and annotation_file - if sequence_id column is present and mode==’sequence’, creates one Sequence for each unique sequence_id
within an Annotation object (Annotation.seq will be a list of Sequences). (If sequence_id is not in the columns, Annotation.seq will just be a Sequence object).
- Parameters:
mode – ‘bbox’ or ‘sequence’
mode=='bbox' (- if)
.bboxes (Annotations have attribute)
mode=='sequence' (- if) –
list of Sequences, one Sequence for each unique value of sequence_id
.seq (Annotations have attribute) –
list of Sequences, one Sequence for each unique value of sequence_id
ignore_annotation_id – [default: False] if True, creates on Annotation object per unique audio_file and annotation_file object ignoring annotation_id. Otherwise, creates separate objects for each unique annotation_id for each unique combination of audio_file and annotation_file.
ignore_sequence_id – [default: False] if True, creates on Sequence object for Annotation.seq ignoring annotation_id. Otherwise, Annotation.seq will be a list of Sequence objects, one for each unique annotation_id in the subset of annotations being created for a single Annotation object. Note: Only relevant for mode=’sequence’
- Returns:
list of crowsetta.Annotation objects (one per unique value of audio_file in self.df - if mode==’bbox’, Annotations have attribute .bboxes - if mode==’sequence’, Annotations have attribute .seq)
- to_csv(path)[source]
save annotation table as csv-formatted text file
Note: the .annotation_files and .audio_files attributes are not saved, only .df is retained in the generated csv file
- Parameters:
path – file path to save to
- Effects:
creates a text file containing comma-delimited contents of self.df
- to_raven_files(save_dir, audio_files=None)[source]
save annotations to a Raven-compatible tab-separated text files
Creates one file per unique audio file in ‘file’ column of self.df
- Parameters:
save_dir – directory for saved files - can be str or pathlib.Path
audio_files – list of audio file paths (as str or pathlib.Path) or None [default: None]. If None, uses self.audio_files. Note that it does not use self.df[‘audio_file’].unique()
- Outcomes:
creates files containing the annotations for each audio file in a format compatible with Raven Pro/Lite. File is tab-separated and contains columns matching the Raven defaults.
Note: Raven Lite does not support additional columns beyond a single annotation column. Additional columns will not be shown in the Raven Lite interface.
- train_test_split(**kwargs)[source]
split annotations into train and test sets
Splits annotations into train and test sets by audio file, not by row, such that all annotations for a given audio file are in either the train or test set. This is useful for ensuring that the same audio files are not present in both the train and test sets, which could lead to data leakage.
Note that because self.audio_files is used, this approach retains audio files that do not have any annotations (which would not be true if the bounding box .df table were split based on the audio file column)
self.audio_files must be set to the list of annotated audio files
- Parameters:
arguments (see sklearn.model_selection.train_test_split for)
(test_size
train_size
random_state
shuffle
stratify)
- trim(start_time, end_time, edge_mode='trim')[source]
Trim the annotations of each file in time
Trims annotations from outside of the time bounds. Note that the annotation start and end times of different files may not represent the same real-world times. This function only uses the numeric values of annotation start and end times in the annotations, which should be relative to the beginning of the corresponding audio file.
For zero-length annotations (start_time = end_time), start and end times are inclusive on the left and exclusive on the right, ie [lower,upper). For instance start_time=0, end_time=1 includes zero-length annotations at 0 but excludes zero-length annotations a 1.
Out-of-place operation: does not modify itself, returns new object
- Parameters:
start_time – time (seconds) since beginning for left bound
end_time – time (seconds) since beginning for right bound
edge_mode – what to do when boxes overlap with edges of trim region - ‘trim’: trim boxes to bounds - ‘keep’: allow boxes to extend beyond bounds - ‘remove’: completely remove boxes that extend beyond bounds
- Returns:
a copy of the BoxedAnnotations object on the trimmed region. - note that, like Audio.trim(), there is a new reference point for 0.0 seconds (located at start_time in the original object). For example, calling .trim(5,10) will result in an annotation previously starting at 6s to start at 1s in the new object.
- class opensoundscape.annotations.CategoricalLabels(files, start_times, end_times, labels, classes=None, integer_labels=False)[source]
Bases:
object- property class_labels
list of lists of class names for each row in self.df
- classmethod from_categorical_labels_df(df, classes, integer_labels=False)[source]
df has columns of ‘file’, ‘start_time’, ‘end_time’, ‘labels’ with labels as list of class names (integer_labels=False) or list of integer class indices (integer_labels=True)
- Parameters:
df (pd.DataFrame) – dataframe with columns of ‘file’, ‘start_time’, ‘end_time’, ‘labels’
classes (list) – list of str class names or list of integer class indices
integer_labels (bool) – if True, labels are integer class indices, otherwise labels are class names
- classmethod from_multihot_df(df)[source]
instantiate from dataframe of 0/1 labels across samples & classes
- Parameters:
df (pd.DataFrame) – dataframe with multi-index of ‘file’,’start_time’,’end_time’; columns are class names, values are 0/1 labels
- property labels
list of lists of integer class indices (corresponding to self.classes) for each row in self.df
- property multihot_dense
2d array of multi-hot (0/1) labels across self.df.index and self.classes
- property multihot_df_dense
dataframe of multi-hot (0/1) labels across self.df.index and self.classes
- property multihot_df_sparse
parse dataframe of multi-hot (0/1) labels across self.df.index and self.classes
- multihot_labels_at_index(index)[source]
multi-hot (list of 0/1 for self.classes) labels at a specific numeric index
- property multihot_sparse
sparse 2d scipy.sparse.csr_matrix of multi-hot (0/1) labels across self.df.index and self.classes
- opensoundscape.annotations.categorical_to_integer_labels(labels, classes)[source]
Convert a list of categorical labels to a list of numeric labels
- opensoundscape.annotations.categorical_to_multi_hot(labels, classes=None, sparse=False)[source]
transform multi-target categorical labels (list of lists) to one-hot array
- Parameters:
labels – list of lists of categorical labels, eg [[‘white’,’red’],[‘green’,’white’]] or [[0,1,2],[3]]
classes=None – list of classes for one-hot labels. if None, taken to be the unique set of values in labels
sparse – bool [default: False] if True, returns a scipy.sparse.csr_matrix
- Returns: tuple (multi_hot, class_subset)
multi_hot: 2d array with 0 for absent and 1 for present class_subset: list of classes corresponding to columns in the array
- opensoundscape.annotations.check_crowsetta_installed()[source]
raise helpful error if optional dependency not installed
- opensoundscape.annotations.diff(base_annotations, comparison_annotations)[source]
look at differences between two BoxedAnnotations objects Not Implemented.
Compare different labels of the same boxes (Assumes that a second annotator used the same boxes as the first, but applied new labels to the boxes)
- opensoundscape.annotations.find_overlapping_idxs_in_clip_df(file, annotation_start, annotation_end, clip_df, min_label_overlap, min_label_fraction=None)[source]
Finds the (file, start_time, end_time) index values for the rows in the clip_df that overlap with the annotation_start and annotation_end :param file: audio file path/name the annotation corresponds to; clip_df is filtered to this file :param annotation_start: start time of the annotation :param annotation_end: end time of the annotation :param clip_df: dataframe with multi-index [‘file’, ‘start_time’, ‘end_time’] :param min_label_overlap: minimum duration (seconds) of annotation within the
time interval for it to count as a label. Note that any annotation of length less than this value will be discarded. We recommend a value of 0.25 for typical bird songs, or shorter values for very short-duration events such as chip calls or nocturnal flight calls.
- Parameters:
min_label_fraction –
- [default: None] if >= this fraction of an annotation
overlaps with the time window, it counts as a label regardless of its duration. Note that if either of the two criterea (overlap and fraction) is met, the label is 1. if None (default), this criterion is not used (i.e., only min_label_overlap is used). A value of 0.5 for this parameter would ensure that all annotations result in at least one clip being labeled 1 (if there are no gaps between clips).
Returns:
[ (file, start_time, end_time)])
- opensoundscape.annotations.integer_to_categorical_labels(labels, classes)[source]
Convert a list of numeric labels to a list of categorical labels
- opensoundscape.annotations.integer_to_multi_hot(labels, n_classes, sparse=False)[source]
transform integer labels to multi-hot array
- Parameters:
labels – list of lists of integer labels, eg [[0,1,2],[3]]
n_classes – number of classes
- Returns:
2d np.array with False for absent and True for present if sparse is True: scipy.sparse.csr_matrix with 0 for absent and 1 for present
- Return type:
if sparse is False
- opensoundscape.annotations.multi_hot_to_categorical(labels, classes)[source]
transform multi-hot (2d array of 0/1) labels to multi-target categorical (list of lists)
- Parameters:
labels – 2d array or scipy.sparse.csr_matrix with 0 for absent and 1 for present
classes – list of classes corresponding to columns in the array
- Returns:
- list of lists of categorical labels for each sample, eg
[[‘white’,’red’],[‘green’,’white’]] or [[0,1,2],[3]]
- opensoundscape.annotations.multi_hot_to_integer_labels(labels)[source]
transform multi-hot (2d array of 0/1) labels to multi-target categorical (list of lists of integer class indices)
- Parameters:
labels – 2d array or scipy.sparse.csr_matrix with 0 for absent and 1 for present
- Returns:
- list of lists of categorical labels for each sample, eg
[[0,1,2],[3]] where 0 corresponds to column 0 of labels
opensoundscape.audio module
audio.py: Utilities for loading and modifying Audio objects
Note: Out-of-place operations
Functions that modify Audio (and Spectrogram) objects are “out of place”,
meaning that they return a new Audio object instead of modifying the
original object. This means that running a line
`
audio_object.resample(22050) # WRONG!
`
will not change the sample rate of audio_object!
If your goal was to overwrite audio_object with the new,
resampled audio, you would instead write
`
audio_object = audio_object.resample(22050)
`
- class opensoundscape.audio.Audio(samples, sample_rate, resample_type='soxr_hq', metadata=None)[source]
Bases:
objectContainer for audio samples
Initialization requires sample array. To load audio file, use Audio.from_file()
Initializing an Audio object directly requires the specification of the sample rate. Use Audio.from_file or Audio.from_bytesio with sample_rate=None to use a native sampling rate.
- Parameters:
samples (np.array) – The audio samples
sample_rate (integer) – The sampling rate for the audio samples
resample_type (str) – The resampling method to use [default: “soxr_hq”]
- Returns:
An initialized Audio object
- apply(function, clip_duration, clip_overlap=None, overlap_fraction=None, clip_step=None, final_clip='extend', rounding_precision=10, **kwargs)[source]
Apply a function to windowed signal, return (times,values)
- Parameters:
function – a function that takes an Audio object as its first argument and returns a value (e.g. scalar, array, etc.)
clip_duration – duration (seconds) of each window
clip_overlap – overlap (seconds) of each window [default: None]
overlap_fraction – fraction of overlap (0 to 1) of each window [default: None]
clip_step – step size (seconds) between windows [default: None]
final_clip –
behavior if final_clip is less than clip_duration seconds long. [default: “extend”] Possible options (any other input will ignore the final clip entirely),
”remainder”: Include the remainder of the Audio (clip will not have clip_duration length)
”full”: Increase the overlap to yield a clip with clip_duration length
”extend”: Similar to remainder but extend (repeat) the clip to reach clip_duration length
None: Discard the remainder
rounding_precision – number of decimal places to round clip start and end times - helps avoid floating point issues when generating clip times
return_df – if True, returns dataframe with the computed value as ‘value’ column in a dataframe with clip start_time and end_time columns
**kwargs – additional keyword arguments to pass to the function
- Returns:
(window start times, values)
- apply_gain(dB, clip_range=(-1, 1))[source]
apply dB (decibels) of gain to audio signal
Specifically, multiplies samples by 10^(dB/20)
- Parameters:
dB – decibels of gain to apply
clip_range – [minimum,maximum] values for samples - values outside this range will be replaced with the range boundary values. Pass None to preserve original sample values without clipping. [Default: [-1,1]]
- Returns:
Audio object with gain applied to samples
- bandpass(low_f, high_f, order)[source]
Bandpass audio signal with a butterworth filter
Uses a phase-preserving algorithm (scipy.signal’s butter and solfiltfilt)
- Parameters:
low_f – low frequency cutoff (-3 dB) in Hz of bandpass filter
high_f – high frequency cutoff (-3 dB) in Hz of bandpass filter
order – butterworth filter order (integer) ~= steepness of cutoff
- change_speed(speed_factor, resample=False, resample_type='soxr_hq')[source]
Change the speed (and pitch) of the audio by a given factor
Audio is reversed if speed_factor is negative
- Parameters:
speed_factor – factor by which to change speed - e.g. 2.0 = twice as fast, 0.5 = half as fast
resample – if True, resample the audio back to the original sample rate - if False, the sample rate is adjusted to reflect the speed change [default: False]
resample_type – type of resampling to use if resample=True - see Audio.resample() for options
- Returns:
Audio object with changed speed
- property dBFS
calculate the root-mean-square dB value relative to a full-scale sine wave
- property duration
Calculates the Audio duration in seconds
- extend_by(duration)[source]
Extend audio file by adding duration seconds of silence to the end
- Parameters:
duration – the final duration in seconds of the audio object
- Returns:
a new Audio object with silence added to the end
- extend_to(duration)[source]
Extend audio file to desired duration by adding silence to the end
If duration is less than or equal to the Audio’s self.duration, the Audio remains unchanged.
Otherwise, silence is added to the end of the Audio object to achieve the desired duration.
- Parameters:
duration – the minimum final duration in seconds of the audio object
- Returns:
a new Audio object of the desired duration
- classmethod from_bytesio(bytesio, sample_rate=None, resample_type='soxr_hq')[source]
Read from bytesio object
Read an Audio object from a BytesIO object. This is primarily used for passing Audio over HTTP.
- Parameters:
bytesio – Contents of WAV file as BytesIO
sample_rate – The final sampling rate of Audio object [default: None]
resample_type – The librosa method to do resampling [default: “soxr_hq”]
- Returns:
An initialized Audio object
- classmethod from_file(path, sample_rate=None, resample_type='soxr_hq', dtype=numpy.float32, load_metadata=True, offset=None, duration=None, start_timestamp=None, out_of_bounds_mode='warn')[source]
Load audio from files
Deal with the various possible input types to load an audio file Also attempts to load metadata using tinytag.
Audio objects only support mono (one-channel) at this time. Files with multiple channels are mixed down to a single channel. To load multiple channels as separate Audio objects, use load_channels_as_audio()
Optionally, load only a piece of a file using offset and duration. This will efficiently read sections of a .wav file regardless of where the desired clip is in the audio. For mp3 files, access time grows linearly with time since the beginning of the file.
This function relies on librosa.load(), which supports wav natively but requires ffmpeg for mp3 support.
- Parameters:
path (str, Path) – path to an audio file
sample_rate (int, None) – resample audio with value and resample_type, if None use source sample_rate (default: None)
resample_type – method used to resample_type (default: “soxr_hq”)
dtype – data type of samples returned [Default: np.float32]
load_metadata (bool) – if True, attempts to load metadata from the audio file. If an exception occurs, self.metadata will be None. Otherwise self.metadata is a dictionary. Note: will also attempt to parse AudioMoth metadata from the comment field, if the artist field includes AudioMoth. The parsing function for AudioMoth is likely to break when new firmware versions change the comment metadata field.
offset – load audio starting at this time (seconds) after the start of the file. Defaults to 0 seconds. - cannot specify both offset and start_timestamp
duration – load audio of this duration (seconds) starting at offset. If None, loads all the way to the end of the file.
start_timestamp –
load audio starting at this localized datetime.datetime timestamp - cannot specify both offset and start_timestamp - will only work if loading metadata results in localized datetime
object for ‘recording_start_time’ key
will raise AudioOutOfBoundsError if requested time period
is not full contained within the audio file Example of creating localized timestamp:
` {import pytz; from datetime import datetime; local_timestamp = datetime(2020,12,25,23,59,59) local_timezone = pytz.timezone('US/Eastern') timestamp = local_timezone.localize(local_timestamp)} `out_of_bounds_mode –
‘warn’: generate a warning [default]
’raise’: raise an AudioOutOfBoundsError
’ignore’: return any available audio with no warning/error
- Returns:
samples, sample_rate, resample_type, metadata (dict or None)
- Return type:
Audio object with attributes
Note: default sample_rate=None means use file’s sample rate, does not resample
- classmethod from_url(url, sample_rate=None, resample_type='kaiser_fast')[source]
Read audio file from URL
Download audio from a URL and create an Audio object
Note: averages channels of multi-channel object to create mono object
- Parameters:
url – Location to download the file from
sample_rate – The final sampling rate of Audio object [default: None] - if None, retains original sample rate
resample_type – The librosa method to do resampling [default: “kaiser_fast”]
- Returns:
Audio object
- highpass(cutoff_f, order)[source]
High-pass audio signal with a butterworth filter
Uses a phase-preserving algorithm (scipy.signal’s butter and solfiltfilt)
Removes low frequencies below cutoff_f and preserves high frequencies
- Parameters:
cutoff_f – cutoff frequency (-3 dB) in Hz of high-pass filter
order – butterworth filter order (integer) ~= steepness of cutoff
- loop(length=None, n=None)[source]
Extend audio file by looping it
- Parameters:
length – the final length in seconds of the looped file (cannot be used with n)[default: None]
n – the number of occurences of the original audio sample (cannot be used with length) [default: None] For example, n=1 returns the original sample, and n=2 returns two concatenated copies of the original sample
- Returns:
a new Audio object of the desired length or repetitions
- lowpass(cutoff_f, order)[source]
Low-pass audio signal with a butterworth filter
Uses a phase-preserving algorithm (scipy.signal’s butter and solfiltfilt)
Removes high frequencies above cuttof_f and preserves low frequencies
- Parameters:
cutoff_f – cutoff frequency (-3 dB) in Hz of lowpass filter
order – butterworth filter order (integer) ~= steepness of cutoff
- metadata
- classmethod noise(duration, sample_rate, color='white', dBFS=-10)[source]
“Create audio object with noise of a desired ‘color’
set np.random.seed() for reproducible results
Based on an implementatino by @Bob in StackOverflow question 67085963
- Parameters:
duration – length in seconds
sample_rate – samples per second
color – any of these colors, which describe the shape of the power spectral density: - white: uniform psd (equal energy per linear frequency band) - pink: psd = 1/sqrt(f) (equal energy per octave) - brownian: psd = 1/f (aka brown noise) - brown: synonym for brownian - violet: psd = f - blue: psd = sqrt(f)
[default – ‘white’]
Returns: Audio object
Note: Clips samples to [-1,1] which can result in dBFS different from that requested, especially when dBFS is near zero
- normalize(peak_level=None, peak_dBFS=None)[source]
Return audio object with normalized waveform
Linearly scales waveform values so that the max absolute value matches the specified value (default: 1.0)
- Parameters:
peak_level – maximum absolute value of resulting waveform
peak_dBFS – maximum resulting absolute value in decibels Full Scale - for example, -3 dBFS equals a peak level of 0.71 - Note: do not specify both peak_level and peak_dBFS
- Returns:
Audio object with normalized samples
Note: if all samples are zero, returns the original object (avoids division by zero)
- pad(pre_duration, post_duration=None, fill=None)[source]
Pad audio file to desired duration by adding silence or Noise to the beginning and end
If post_duration is None, it is set equal to pre_duration.
Otherwise, silence/noise is added to the beginning and end of the Audio object to achieve the desired duration. If an odd number of samples is needed, the extra sample is added to the end.
- Parameters:
pre_duration – the duration in seconds to pad at the beginning of the audio object
post_duration – the duration in seconds to pad at the end of the audio object - if None, set equal to pre_duration [default: None]
fill – noise color to use for padding. If None, uses silence. Options are the same as Audio.noise(): - ‘white’: uniform psd (equal energy per linear frequency band) - ‘pink’: psd = 1/sqrt(f) (equal energy per octave) - ‘brownian’: psd = 1/f (aka brown noise) - ‘brown’: synonym for brownian - ‘violet’: psd = f - ‘blue’: psd = sqrt(f) [default: None]
- Returns:
a new Audio object of the desired duration
- pad_to(duration, fill=None)[source]
Pad audio file to desired duration by adding silence or Noise to the beginning and end
If duration is less than or equal to the Audio’s self.duration, the Audio remains unchanged.
Otherwise, silence/noise is added to the beginning and end of the Audio object to achieve the desired duration. If an odd number of samples is needed, the extra sample is added to the end.
- Parameters:
duration – the minimum final duration in seconds of the audio object
fill – noise color to use for padding. If None, uses silence. Options are the same as Audio.noise(): - ‘white’: uniform psd (equal energy per linear frequency band) - ‘pink’: psd = 1/sqrt(f) (equal energy per octave) - ‘brownian’: psd = 1/f (aka brown noise) - ‘brown’: synonym for brownian - ‘violet’: psd = f - ‘blue’: psd = sqrt(f) [default: None]
- Returns:
a new Audio object of the desired duration
- reduce_noise(noisereduce_kwargs=None)[source]
Reduce noise in audio signal using noisereduce package
- Parameters:
noisereduce_kwargs – dictionary of args to pass to noisereduce.reduce_noise
- Returns:
Audio object with noise reduction applied
- resample(sample_rate, resample_type=None, clip_range=(-1, 1))[source]
Resample Audio object
- Parameters:
sample_rate (scalar) – the new sample rate
resample_type (str) – resampling algorithm to use [default: None (uses self.resample_type of instance)]
clip_range – (float,float): min and max value bounding output samples
- Returns:
a new Audio object of the desired sample rate
- resample_type
- property rms
Calculates the root-mean-square value of the audio samples
- sample_rate
- samples
- save(path, metadata_format='opso', suppress_warnings=False, **sf_kwargs)[source]
Save Audio to file
supports all file formats supported by underlying package soundfile, including WAV, MP3, and others
NOTE: saving metadata is only supported for WAV and AIFF formats
Supports writing the following metadata fields: [“title”,”copyright”,”software”,”artist”,”comment”,”date”, “album”,”license”,”tracknumber”,”genre”]
- Parameters:
path – destination for output
metadata_format –
strategy for saving metadata. Can be: - ‘opso’ [Default]: Saves metadata dictionary in the comment
field as a JSON string. Uses the most recent version of opso_metadata formats.
’opso_metadata_v0.1’: specify the exact version of opso_metadata to use
- ’soundfile’: Saves the default soundfile metadata fields only:
- [“title”,”copyright”,”software”,”artist”,”comment”,”date”,
”album”,”license”,”tracknumber”,”genre”]
None: does not save metadata to file
suppress_warnings – if True, will not warn user when unable to save metadata [default: False]
**sf_kwargs – additional keyword arguments to pass to soundfile.write(). See the docstring of soundfile.write() or soundfile.SoundFile for details.
Examples:
Basic saving with default metadata format: audio.save(“output.wav”) or audio.save(“output.mp3”)
Save a specific format and subtype, such as OPUS: (use soundfile.list_formats() and soundfile.list_subtypes() to see available formats and subtypes) audio.save(“output.ogg”, format=”OGG”, subtype=”OPUS”)
Specify bitrate: soundfinder >= v0.13.1 supports variable bitrate and compression level for mp3 / OPUS formats: audio.save(“output.mp3”, bitrate_mode=”VARIABLE”, compression_level=0.8)
# change sample rate before saving: audio.resample(44100).save(“output.wav”)
- show_widget(normalize=False, autoplay=False)[source]
create and display IPython.display.Audio widget; see that class for docs
- classmethod silence(duration, sample_rate)[source]
“Create audio object with zero-valued samples
- Parameters:
duration – length in seconds
sample_rate – samples per second
Note: rounds down to integer number of samples
- spectrum()[source]
Create frequency spectrum from an Audio object using fft
- Parameters:
self
- Returns:
fft, frequencies
- split(clip_duration, **kwargs)[source]
Split Audio into even-lengthed clips
The Audio object is split into clips of a specified duration and overlap
- Parameters:
clip_duration (float) – The duration in seconds of the clips
**kwargs (such as overlap_fraction, final_clip) – opensoundscape.utils.generate_clip_times_df() - extends last Audio object if user passes final_clip == “extend”
- Returns:
list of audio objects - dataframe w/columns for start_time and end_time of each clip
- Return type:
audio_clips
- split_and_save(destination, prefix, clip_duration, clip_overlap=0, final_clip='extend', dry_run=False)[source]
Split audio into clips and save them to a folder
- Parameters:
destination – A folder to write clips to
prefix – A name to prepend to the written clips
clip_duration – The duration of each clip in seconds
clip_overlap – The overlap of each clip in seconds [default: 0]
final_clip (str) – Behavior if final_clip is less than clip_duration seconds long.
[default –
None] By default, ignores final clip entirely. Possible options (any other input will ignore the final clip entirely),
”remainder”: Include the remainder of the Audio (clip will not have clip_duration length)
”full”: Increase the overlap to yield a clip with clip_duration length
”extend”: Similar to remainder but extend (repeat) the clip to reach clip_duration length
None: Discard the remainder
dry_run (bool) – If True, skip writing audio and just return clip DataFrame [default: False]
- Returns:
pandas.DataFrame containing paths and start and end times for each clip
- trim(start_time, end_time, out_of_bounds_mode='ignore')[source]
Trim Audio object in time
If start_time is less than zero, output starts from time 0 If end_time is beyond the end of the sample, trims to end of sample
- Parameters:
start_time – time in seconds for start of extracted clip
end_time – time in seconds for end of extracted clip - if negative, counts from end of audio - if None, extracts to end of audio
out_of_bounds_mode – behavior if requested time period is not fully contained within the audio file. Options: - ‘ignore’: return any available audio with no warning/error [default] - ‘warn’: generate a warning - ‘raise’: raise an AudioOutOfBoundsError
- Returns:
a new Audio object containing samples from start_time to end_time - metadata is updated to reflect new start time and duration
see also: trim_samples() to trim using sample positions instead of times and trim_with_timestamps() to trim using localized datetime.datetime objects
- trim_samples(start_sample, end_sample, out_of_bounds_mode='ignore')[source]
Trim Audio object by sample indices
resulting sample array contains self.samples[start_sample:end_sample]
If start_sample is less than zero, output starts from sample 0 If end_sample is beyond the end of the sample, trims to end of sample
- Parameters:
start_sample – sample index for start of extracted clip, inclusive
end_sample – sample index for end of extracted clip, exlusive
out_of_bounds_mode – behavior if requested time period is not fully contained within the audio file. Options: - ‘ignore’: return any available audio with no warning/error [default] - ‘warn’: generate a warning - ‘raise’: raise an AudioOutOfBoundsError
- Returns:
a new Audio object containing samples from start_sample to end_sample - metadata is updated to reflect new start time and duration
see also: trim() to trim using time in seconds instead of sample positions and trim_with_timestamps() to trim using localized datetime.datetime objects
- trim_with_timestamps(start_timestamp, end_timestamp=None, duration=None, out_of_bounds_mode='warn')[source]
Trim Audio object by localized datetime.datetime timestamps
requires that .metadata[‘recording_start_time’] is a localized datetime.datetime object
- Parameters:
start_timestamp – localized datetime.datetime object for start of extracted clip e.g. datetime(2020,4,4,10,25,15,tzinfo=pytz.UTC)
end_timestamp – localized datetime.datetime object for end of extracted clip e.g. datetime(2020,4,4,10,25,20,tzinfo=pytz.UTC)
duration – duration in seconds of the extracted clip - specify exactly one of duration or end_datetime
out_of_bounds_mode – behavior if requested time period is not fully contained within the audio file. Options: - ‘ignore’: return any available audio with no warning/error [default] - ‘warn’: generate a warning - ‘raise’: raise an AudioOutOfBoundsError
- Returns:
a new Audio object containing samples from start_timestamp to end_timestamp - metadata is updated to reflect new start time and duration
- exception opensoundscape.audio.AudioOutOfBoundsError[source]
Bases:
ExceptionCustom exception indicating the user tried to load audio outside of the time period that exists in the audio object
- class opensoundscape.audio.MultiChannelAudio(samples, sample_rate, resample_type='soxr_hq', metadata=None)[source]
Bases:
Audio- apply_channel_gain(dB, clip_range=(-1, 1))[source]
apply dB (decibels) of gain to each channel of audio
Specifically, multiplies samples by 10^(dB/20)
- Parameters:
dB – list of float, decibels of gain to apply to each channel must have length == self.n_channels
clip_range – [minimum,maximum] values for samples - values outside this range will be replaced with the range boundary values. Pass None to preserve original sample values without clipping. [Default: [-1,1]]
- Returns:
Audio object with gain applied to samples
- property duration
Calculates the Audio duration in seconds
- extend_to(duration)[source]
Extend audio file to desired duration by adding silence to the end
If duration is less than or equal to the Audio’s self.duration, the Audio remains unchanged.
Otherwise, silence is added to the end of the Audio object to achieve the desired duration.
- Parameters:
duration – the minimum final duration in seconds of the audio object
- Returns:
a new Audio object of the desired duration
- classmethod from_audio_list(audio_list, sample_rate=None)[source]
Create MultiChannelAudio object from list of Audio objects, one per channel
Metadata and resample_type of new object are copied from first object in list
- Parameters:
audio_list – list of Audio objects
sample_rate – target sample rate, if None uses the sample rate of the first Audio object
- Returns:
MultiChannelAudio object
- classmethod from_file(path, sample_rate=None, resample_type='soxr_hq', dtype=numpy.float32, load_metadata=True, offset=None, duration=None, start_timestamp=None, out_of_bounds_mode='ignore')[source]
Load audio from files
Deal with the various possible input types to load an audio file Also attempts to load metadata using tinytag.
Audio objects only support mono (one-channel) at this time. Files with multiple channels are mixed down to a single channel. To load multiple channels as separate Audio objects, use load_channels_as_audio()
Optionally, load only a piece of a file using offset and duration. This will efficiently read sections of a .wav file regardless of where the desired clip is in the audio. For mp3 files, access time grows linearly with time since the beginning of the file.
This function relies on librosa.load(), which supports wav natively but requires ffmpeg for mp3 support.
- Parameters:
path (str, Path) – path to an audio file
sample_rate (int, None) – resample audio with value and resample_type, if None use source sample_rate (default: None)
resample_type – method used to resample_type (default: “soxr_hq”)
dtype – data type of samples returned [Default: np.float32]
load_metadata (bool) – if True, attempts to load metadata from the audio file. If an exception occurs, self.metadata will be None. Otherwise self.metadata is a dictionary. Note: will also attempt to parse AudioMoth metadata from the comment field, if the artist field includes AudioMoth. The parsing function for AudioMoth is likely to break when new firmware versions change the comment metadata field.
offset – load audio starting at this time (seconds) after the start of the file. Defaults to 0 seconds. - cannot specify both offset and start_timestamp
duration – load audio of this duration (seconds) starting at offset. If None, loads all the way to the end of the file.
start_timestamp –
load audio starting at this localized datetime.datetime timestamp - cannot specify both offset and start_timestamp - will only work if loading metadata results in localized datetime
object for ‘recording_start_time’ key
will raise AudioOutOfBoundsError if requested time period
is not full contained within the audio file Example of creating localized timestamp:
` import pytz; from datetime import datetime; local_timestamp = datetime(2020,12,25,23,59,59) local_timezone = pytz.timezone('US/Eastern') timestamp = local_timezone.localize(local_timestamp) `out_of_bounds_mode –
‘warn’: generate a warning [default]
’raise’: raise an AudioOutOfBoundsError
’ignore’: return any available audio with no warning/error
- Returns:
samples, sample_rate, resample_type, metadata (dict or None)
- Return type:
Audio object with attributes
Note: default sample_rate=None means use file’s sample rate, does not resample
- metadata
- property n_channels
- classmethod noise(duration, sample_rate, color='white', dBFS=-10, channels=2)[source]
“Create audio object with noise of a desired ‘color’
set np.random.seed() for reproducible results
Based on an implementatino by @Bob in StackOverflow question 67085963
- Parameters:
duration – length in seconds
sample_rate – samples per second
color – any of these colors, which describe the shape of the power spectral density: - white: uniform psd (equal energy per linear frequency band) - pink: psd = 1/sqrt(f) (equal energy per octave) - brownian: psd = 1/f (aka brown noise) - brown: synonym for brownian - violet: psd = f - blue: psd = sqrt(f)
[default – ‘white’]
Returns: Audio object
Note: Clips samples to [-1,1] which can result in dBFS different from that requested, especially when dBFS is near zero
- resample_type
- sample_rate
- samples
- save(*args, **kwargs)[source]
Save Audio to file
supports all file formats supported by underlying package soundfile, including WAV, MP3, and others
NOTE: saving metadata is only supported for WAV and AIFF formats
Supports writing the following metadata fields: [“title”,”copyright”,”software”,”artist”,”comment”,”date”, “album”,”license”,”tracknumber”,”genre”]
- Parameters:
path – destination for output
metadata_format –
strategy for saving metadata. Can be: - ‘opso’ [Default]: Saves metadata dictionary in the comment
field as a JSON string. Uses the most recent version of opso_metadata formats.
’opso_metadata_v0.1’: specify the exact version of opso_metadata to use
- ’soundfile’: Saves the default soundfile metadata fields only:
- [“title”,”copyright”,”software”,”artist”,”comment”,”date”,
”album”,”license”,”tracknumber”,”genre”]
None: does not save metadata to file
suppress_warnings – if True, will not warn user when unable to save metadata [default: False]
**sf_kwargs – additional keyword arguments to pass to soundfile.write(). See the docstring of soundfile.write() or soundfile.SoundFile for details.
Examples:
Basic saving with default metadata format: audio.save(“output.wav”) or audio.save(“output.mp3”)
Save a specific format and subtype, such as OPUS: (use soundfile.list_formats() and soundfile.list_subtypes() to see available formats and subtypes) audio.save(“output.ogg”, format=”OGG”, subtype=”OPUS”)
Specify bitrate: soundfinder >= v0.13.1 supports variable bitrate and compression level for mp3 / OPUS formats: audio.save(“output.mp3”, bitrate_mode=”VARIABLE”, compression_level=0.8)
# change sample rate before saving: audio.resample(44100).save(“output.wav”)
- classmethod silence(duration, sample_rate, channels)[source]
“Create audio object with zero-valued samples
- Parameters:
duration – length in seconds
sample_rate – samples per second
Note: rounds down to integer number of samples
- spectrum()[source]
Create frequency spectrum from an Audio object using fft
first averages over all of the channels to create a mono signal
- Parameters:
self
- Returns:
fft, frequencies
- split_and_save(destination, prefix, clip_duration, clip_overlap=0, final_clip='extend', dry_run=False)[source]
Split audio into clips and save them to a folder
- Parameters:
destination – A folder to write clips to
prefix – A name to prepend to the written clips
clip_duration – The duration of each clip in seconds
clip_overlap – The overlap of each clip in seconds [default: 0]
final_clip (str) – Behavior if final_clip is less than clip_duration seconds long.
[default –
None] By default, ignores final clip entirely. Possible options (any other input will ignore the final clip entirely),
”remainder”: Include the remainder of the Audio (clip will not have clip_duration length)
”full”: Increase the overlap to yield a clip with clip_duration length
”extend”: Similar to remainder but extend (repeat) the clip to reach clip_duration length
None: Discard the remainder
dry_run (bool) – If True, skip writing audio and just return clip DataFrame [default: False]
- Returns:
pandas.DataFrame containing paths and start and end times for each clip
- trim_samples(*args, **kwargs)[source]
Trim Audio object by sample indices
resulting sample array contains self.samples[start_sample:end_sample]
If start_sample is less than zero, output starts from sample 0 If end_sample is beyond the end of the sample, trims to end of sample
- Parameters:
start_sample – sample index for start of extracted clip, inclusive
end_sample – sample index for end of extracted clip, exlusive
out_of_bounds_mode – behavior if requested time period is not fully contained within the audio file. Options: - ‘ignore’: return any available audio with no warning/error [default] - ‘warn’: generate a warning - ‘raise’: raise an AudioOutOfBoundsError
- Returns:
a new Audio object containing samples from start_sample to end_sample - metadata is updated to reflect new start time and duration
see also: trim() to trim using time in seconds instead of sample positions and trim_with_timestamps() to trim using localized datetime.datetime objects
- exception opensoundscape.audio.OpsoLoadAudioInputError[source]
Bases:
ExceptionCustom exception indicating we can’t load input
- opensoundscape.audio.bandpass_filter(signal, low_f, high_f, sample_rate, order=9)[source]
perform a butterworth bandpass filter on a discrete time signal using scipy.signal’s butter and sosfiltfilt (phase-preserving filtering)
- Parameters:
signal – discrete time signal (audio samples, list of float)
low_f – -3db point for highpass filter (Hz)
high_f – -3db point for highpass filter (Hz)
sample_rate – samples per second (Hz)
order – higher values -> steeper dropoff [default: 9]
- Returns:
filtered time signal
- opensoundscape.audio.clipping_detector(samples, threshold=0.6)[source]
count the number of samples above a threshold value
- Parameters:
samples – a time series of float values
threshold=0.6 – minimum value of sample to count as clipping
- Returns:
number of samples exceeding threshold
- opensoundscape.audio.concat(audio_objects, sample_rate=None)[source]
concatenate a list of Audio objects end-to-end
- Parameters:
audio_objects – iterable of Audio objects
sample_rate – target sampling rate - if None, uses sampling rate of _first_ Audio object in list - default: None
Returns: a single Audio object
Notes: discards metadata and retains .resample_type of _first_ audio object
- opensoundscape.audio.estimate_delay(primary_audio, reference_audio, max_delay, frequency_range=None, cc_filter='phat', return_cc_max=False)[source]
Use generalized cross correlation to estimate time delay between 2 audio objects containing the same signal. The audio objects must be time-synchronized. For example, if audio is delayed by 1 second compared to reference_audio, then estimate_delay(audio, reference_audio, max_delay) will return 1.0.
NOTE: Only the central portion of the signal (between start + max_delay and end - max_delay) is used for cross-correlation. This is to avoid edge effects. This means estimate_delay(primary_audio, reference_audio, max_delay) is not necessarily == estimate_delay(reference_audio, primary_audio, max_delay
- Parameters:
primary_audio – audio object containing the signal of interest
reference_audio – audio object containing the reference signal.
max_delay – maximum time delay to consider, in seconds. Must be less than the duration of the primary audio. (see opensoundscape.signal_processing.tdoa)
frequency_range –
tuple of (low_f, high_f) frequencies in Hz to use in the generalized cross correlation. If None, all frequencies are kept. First or second value can be None if no lower or upper limit is desired Note: retaining high frequencies near the Nyquist frequency sometimes results
in spurious cross correlation values at 0 or at the beginning/end of the signal when using ‘phat’ and ‘scot’ methods.
cc_filter – generalized cross correlation type, see opensoundscape.signal_processing.gcc() [default: ‘phat’]
return_cc_max – if True, returns cross correlation max value as second argument (see opensoundscape.signal_processing.tdoa)
- Returns:
estimated time delay (seconds) from reference_audio to audio
if return_cc_max is True, also returns a second value, the max of the cross correlation of the two signals
Note: resamples reference_audio if its sample rate does not match audio
- opensoundscape.audio.generate_opso_metadata_str(metadata_dictionary, version='v0.1')[source]
generate json string for comment field containing metadata
Preserve Audio.metadata dictionary by dumping to a json string and including it as the ‘comment’ field when saving WAV files.
The string begins with opso_metadata The contents of the string after this 13 character prefix should be parsable as JSON, and should have a key opso_metadata_version specifying the version of the metadata format, for instance ‘v0.1’.
See also: parse_opso_metadata which parses the string created by this fundtion
- Parameters:
metadata_dictionary – dictionary of audio metadata. Should conform to opso_metadata version. v0.1 should have only strings and floats except the “recording_start_time” key, which should contain a localized (ie has timezone) datetime.datetime object. The datetime is saved as a string in ISO format using datetime.isoformat() and loaded with datetime.fromisoformat().
version – version number of opso_metadata format. Currently implemented: [‘v0.1’]
- Returns:
string beginning with opso_metadata followed by JSON-parseable string containing the metadata.
- opensoundscape.audio.highpass_filter(signal, cutoff_f, sample_rate, order=9)[source]
perform a butterworth highpass filter on a discrete time signal using scipy.signal’s butter and sosfiltfilt (phase-preserving filtering)
- Parameters:
signal – discrete time signal (audio samples, list of float)
cutoff_f – -3db point for highpass filter (Hz)
sample_rate – samples per second (Hz)
order – higher values -> steeper dropoff [default: 9]
- Returns:
filtered time signal
- opensoundscape.audio.load_channels_as_audio(path, sample_rate=None, resample_type='soxr_hq', dtype=numpy.float32, offset=0, duration=None, metadata=True)[source]
Load each channel of an audio file to a separate Audio object
Provides a way to access individual channels, since Audio.from_file mixes down to mono by default
- Parameters:
Audio.from_file() (see)
- Returns:
list of Audio objects (one per channel)
- Note: metadata is copied to each Audio object, but will contain an
additional field: “channel”=”1 of 3” for first of 3 channels
- opensoundscape.audio.lowpass_filter(signal, cutoff_f, sample_rate, order=9)[source]
perform a butterworth lowpass filter on a discrete time signal using scipy.signal’s butter and sosfiltfilt (phase-preserving filtering)
- Parameters:
signal – discrete time signal (audio samples, list of float)
low_f – -3db point (?) for highpass filter (Hz)
high_f – -3db point (?) for highpass filter (Hz)
sample_rate – samples per second (Hz)
order – higher values -> steeper dropoff [default: 9]
- Returns:
filtered time signal
- opensoundscape.audio.mix(audio_objects, duration=None, gain=-3, offsets=None, sample_rate=None, clip_range=(-1, 1))[source]
mixdown (superimpose) Audio signals into a single Audio object
Adds audio samples from multiple audio objects to create a mixdown of Audio samples. Resamples all audio to a consistent sample rate, and optionally applies individual gain and time-offsets to each Audio.
- Parameters:
audio_objects – iterable of Audio objects, or MultiChannelAudio objects
duration –
duration in seconds of returned Audio. Can be: - number: extends shorter Audio with silence
and truncates longer Audio
- None: extends all Audio to the length of the longest
value of (Audio.duration + offset)
[default: None]
gain –
number, list of numbers, or None - number: decibles of gain to apply to all objects - list of numbers: dB of gain to apply to each object
(length must match length of audio_objects)
[default: -3 dB on each object]
offsets – list of time-offsets (seconds) for each Audio object For instance [0,1] starts the first Audio at 0 seconds and shifts the second Audio to start at 1.0 seconds - if None, all objects start at time 0 - otherwise, length must match length of audio_objects.
sample_rate – sample rate of returned Audio object - integer: resamples all audio to this sample rate - None: uses sample rate of _first_ Audio object [default: None]
clip_range – minimum and maximum sample values. Samples outside this range will be replaced by the range limit values Pass None to keep sample values without clipping. [default: (-1,1)]
- Returns:
Audio object, or MultiChannelAudio object if input is list of MultiChannelAudio
Notes
Audio metadata is copied from first object in list Resampling of each Audio uses respective .resample_type of objects. if MultiChannelAudio objects are passed, returns MultiChannelAudio object with
the same number of channels as the maximum number of channels of any input. Channels with fewer channels are zero-padded, and channel[n] is always summed with channel[n] of other objects.
- opensoundscape.audio.parse_metadata(path)[source]
parse metadata from wav file header and return a dictionary
supports parsing of opso metadata format as well as AudioMoth and basic wav headers
for files recorded by AudioMoth firmware, the comment field is parsed into other fields such as recording_start_time and temperature_C
uses SoundFile for file header parsing
- Parameters:
path – file path to audio file
- Returns:
dictionary with key/value pairs of parsed metadata
- opensoundscape.audio.parse_opso_metadata(comment_string)[source]
parse metadata saved by opensoundcsape as json in comment field
Parses a json string which opensoundscape saves to the comment metadata field of WAV files to preserve metadata. The string begins with opso_metadata The contents of the string after this 13 character prefix should be parsable as JSON, and should have a key opso_metadata_version specifying the version of the metadata format, for instance ‘v0.1’.
see also generate_opso_metadata which generates the string parsed by this function.
- Parameters:
comment_string – a string beginning with opso_metadata followed by JSON parseable dictionary
Returns: dictionary of parsed metadata
- opensoundscape.audio.write_metadata(metadata, metadata_format, path, suppress_warnings=False)[source]
write metadata using one of the supported formats
Currently, only supports writing to .WAV and .AIFF.
metadata fields containing empty strings ‘’ will be replaced by a string containing a single space ‘ ‘ as a workaround to https://github.com/bastibe/python-soundfile/issues/386.
- Parameters:
metadata – dictionary of metadata
metadata_format – one of ‘opso’,’opso_metadata_v0.1’,’soundfile’ (see Audio.wave documentation)
path – file path to save metadata in with soundfile
suppress_warnings – if True, does not warn user when writing to unsupported format
opensoundscape.data_selection module
tools for subsetting and resampling collections
- opensoundscape.data_selection.resample(df, n_samples_per_class, n_samples_without_labels=None, upsample=True, downsample=True, with_replace=False, random_state=None)[source]
resample a one-hot encoded label df for a target n_samples_per_class
Returns a new dataframe with duplicated and/or subset rows. Note that the order of samples changes.
Can enable/disable upsampling (randomly repeating rows) and downsampling (randomly subsetting rows)
- Parameters:
df – dataframe with one-hot encoded labels: columns are classes, index is sample name/path
n_samples_per_class – target number of samples per class
n_samples_without_labels –
number of samples with all-0 labels to include in the returned df None or integer. - [default: None] keeps all of the original df’s rows that have all-0 labels. - if integer > 0: upsample or downsample as needed from original df to achieve this number
of rows with all-0 labels
if 0: no all-0 labels are included in the returned df
Note: upsample and downsample arguments are ignored for generating all-0 label samples.
upsample – if True, duplicate samples for classes with <n samples to get to n samples
downsample – if True, randomly sample classis with >n samples to get to n samples
with_replace – flag to enable sampling of the same row more than once, default False
random_state – passed to np.random calls. If None, random state is not fixed.
Note: The algorithm assumes that the label df is single-label. If the label df is multi-label, some classes can end up over-represented.
Note 2: The resulting df will have samples ordered by class label, even if the input df had samples in a random order.
- opensoundscape.data_selection.train_test_split(df, test_size=0.2, random_state=None, by_file=True)[source]
split a multi-index label df into train and evaluation (i.e. validation or test) splits by file
Input dataframes should have a multi-index of (file,start_time,end_time).
By default, the split is done by file name, so that all rows with the same file name are in the same set. This is to prevent data leakage between the sets, since rows with the same file name are likely to be similar. Set by_file=False to split by row instead, which randomly assigns each row to a split.
- Parameters:
df – dataframe with multi-hot encoded labels: multi-index is (file,start_time,end_time)
test_size – proportion of samples to include in the test set
random_state – passed to np.random calls. If None, random state is not fixed.
by_file – if True, split by file name (i.e. all rows with the same file name go in the same set) If False, split by row (i.e. rows with the same file name can be in different sets)
- Returns:
dataframe with multi-hot encoded labels for the training set test_df: dataframe with multi-hot encoded labels for the test set
- Return type:
train_df
- opensoundscape.data_selection.upsample(input_df, label_column='Labels', with_replace=False, random_state=None)[source]
Given a input DataFrame of categorical labels, upsample to maximum value
Upsampling removes the class imbalance in your dataset. Rows for each label are repeated up to max_count // rows. Then, we randomly sample the rows to fill up to max_count.
The input df is NOT one-hot encoded in this case, but instead contains categorical labels in a specified label_columns
- Parameters:
input_df – A DataFrame to upsample
label_column – The column to draw unique labels from
once (with_replace flag to enable sampling of the same row more than)
False (default)
random_state – Set the random_state during sampling
- Returns:
An upsampled DataFrame
- Return type:
opensoundscape.logging module
helpers for integrating with WandB and exporting content
- opensoundscape.logging.wandb_table(dataset, n=None, classes_to_extract=(), random_state=None, raise_exceptions=False, drop_labels=False, gradcam_model=None)[source]
Generate a wandb Table visualizing n random samples from a sample_df
- Parameters:
dataset – object to generate samples, eg AudioFileDataset or AudioSplittingDataset
n – number of samples to generate (randomly selected from df) - if None, does not subsample or change order
bypass_augmentations – if True, augmentations in Preprocessor are skipped
classes_to_extract – tuple of classes - will create columns containing the scores/labels
random_state – default None; if integer provided, used for reproducible random sample
drop_labels – if True, does not include ‘label’ column in Table
gradcam_model – if not None, will generate GradCAMs for each sample using gradcam_model.get_cams() - requires optional dependency pip install grad-cam
Returns: a W&B Table of preprocessed samples with labels and playable audio
opensoundscape.metrics module
- opensoundscape.metrics.multi_target_metrics(targets, scores, class_names, threshold)[source]
generate various metrics for a set of scores and labels (targets)
- Parameters:
targets – 0/1 labels in 2d array
scores – continuous values in 2d array
class_names – list of strings
threshold – scores >= threshold result in prediction of 1, while scores < threshold result in prediction of 0
- Returns:
dictionary of various overall and per-class metrics - precision, recall, F1 are np.nan if no 1-labels for a class - au_roc, avg_precision are np.nan if all labels are either 0 or 1
Definitions: - au_roc: area under the receiver operating characteristic curve - avg_precision: average precision (same as area under PR curve) - Jaccard: Jaccard similarity coefficient score (intersection over union) - hamming_loss: fraction of labels that are incorrectly predicted
- Return type:
metrics_dict
- opensoundscape.metrics.predict_multi_target_labels(scores, threshold)[source]
Generate boolean multi-target predicted labels from continuous scores
For each sample, each class score is compared to a threshold. Any class can be predicted 1 or 0, independent of other classes.
This function internally uses torch.Tensors to optimize performance
Note: threshold can be a single value or list of per-class thresholds
- Parameters:
scores – 2d np.array, 2d list, 2d torch.Tensor, or pd.DataFrame containing continuous scores
threshold –
a number or list of numbers with a threshold for each class - if a single number, used as a threshold for all classes (columns) - if a list, length should match number of columns in scores. Each
value in the list will be used as a threshold for each respective class (column).
Returns: 1/0 values with 1 if score exceeded threshold and 0 otherwise
See also: predict_single_target_labels
- opensoundscape.metrics.predict_single_target_labels(scores)[source]
Generate boolean single target predicted labels from continuous scores
For each row, the single highest scoring class will be labeled 1 and all other classes will be labeled 0.
This function internally uses torch.Tensors to optimize performance
- Parameters:
scores – 2d np.array, 2d list, 2d torch.Tensor, or pd.DataFrame containing continuous scores
Returns: boolean value where each row has 1 for the highest scoring class and 0 for all other classes. Returns same datatype as input.
See also: predict_multi_target_labels
- opensoundscape.metrics.single_target_metrics(targets, scores)[source]
generate various metrics for a set of scores and labels (targets)
Predicts 1 for the highest scoring class per sample and 0 for all other classes.
- Parameters:
targets – 0/1 labels in 2d array
scores – continuous values in 2d array
- Returns:
dictionary of various overall and per-class metrics
- Return type:
metrics_dict
opensoundscape.ribbit module
Detect periodic vocalizations with RIBBIT
This module provides functionality to search audio for periodically fluctuating vocalizations.
- opensoundscape.ribbit.calculate_pulse_score(amplitude, amplitude_sample_rate, pulse_rate_range, plot=False, nfft=1024)[source]
Search for amplitude pulsing in an audio signal in a range of pulse repetition rates (PRR)
scores an audio amplitude signal by highest value of power spectral density in the PRR range
Note: the implementation of Spectrogram.net_amplitude() changed in opensoundscape v0.13.0, which results in vastly different absolute values of the RIBBiT function. However, relative scores between audio files are typically similar.
- Parameters:
amplitude – a time series of the audio signal’s amplitude (for instance a smoothed raw audio signal)
amplitude_sample_rate – sample rate in Hz of amplitude signal, normally ~20-200 Hz
pulse_rate_range – [min, max] values for amplitude modulation in Hz
plot=False – if True, creates a plot visualizing the power spectral density
nfft=1024 – controls the resolution of the power spectral density (see scipy.signal.welch)
- Returns:
pulse rate score for this audio segment (float)
- opensoundscape.ribbit.ribbit(spectrogram, signal_band, pulse_rate_range, clip_duration, clip_overlap=None, overlap_fraction=None, clip_step=None, final_clip='remainder', noise_bands=None, spec_clip_range=(-100, -20), plot=False)[source]
Run RIBBIT detector to search for periodic calls in audio
Searches for periodic energy fluctuations at specific repetition rates and frequencies.
- Parameters:
spectrogram – opensoundscape.Spectrogram object of an audio file
signal_band – [min, max] frequency range of the target species, in Hz
pulse_rate_range – [min,max] pulses per second for the target species
clip_duration – the length of audio (in seconds) to analyze at one time - each clip is analyzed independently and recieves a ribbit score
clip_overlap (float) – overlap between consecutive clips (sec)
overlap_fraction (float) – overlap between consecutive clips as a fraction of clip_duration
clip_step (float) – step size between consecutive clips (sec) - only one of clip_overlap, overlap_fraction, or clip_step should be provided - if all are None, defaults to clip_overlap=0
final_clip (str) –
behavior if final clip is less than clip_duration seconds long. By default, discards remaining audio if less than clip_duration seconds long [default: None]. Options: - None: Discard the remainder (do not make a clip) - “remainder”: Use only remainder of Audio (final clip will be shorter than
clip_duration)
- ”full”: Increase overlap with previous clip to yield a clip with
clip_duration length
- ”extend”: Extend the final clip with zeros (silence) to yield a clip with
clip_duration length
noise_bands – list of frequency ranges to subtract from the signal_band For instance: [ [min1,max1] , [min2,max2] ] - if None, no noise bands are used - default: None
spec_clip_range – tuple of (low,high) spectrogram values. The values in spectrogram will be clipped to this range (spectrogram.limit_range()) - Default of (-100,-20) matches default decibel_limits parameter of earlier opensoundscape versions, which clipped spectrogram values to this range when the spectrogram was initialized.
plot=False – if True, plot the power spectral density for each clip
- Returns:
DataFrame with columns [‘start_time’,’end_time’,’score’], with a row for each clip.
Notes
__PARAMETERS__ RIBBIT requires the user to select a set of parameters that describe the target vocalization. Here is some detailed advice on how to use these parameters.
Signal Band: The signal band is the frequency range where RIBBIT looks for the target species. It is best to pick a narrow signal band if possible, so that the model focuses on a specific part of the spectrogram and has less potential to include erronious sounds.
Noise Bands: Optionally, users can specify other frequency ranges called noise bands. Sounds in the noise_bands are _subtracted_ from the signal_band. Noise bands help the model filter out erronious sounds from the recordings, which could include confusion species, background noise, and popping/clicking of the microphone due to rain, wind, or digital errors. It’s usually good to include one noise band for very low frequencies – this specifically eliminates popping and clicking from being registered as a vocalization. It’s also good to specify noise bands that target confusion species. Another approach is to specify two narrow noise_bands that are directly above and below the signal_band.
Pulse Rate Range: This parameters specifies the minimum and maximum pulse rate (the number of pulses per second, also known as pulse repetition rate) RIBBIT should look for to find the focal species. For example, choosing pulse_rate_range = [10, 20] means that RIBBIT should look for pulses no slower than 10 pulses per second and no faster than 20 pulses per second.
Clip Duration: The clip_duration parameter tells RIBBIT how many seconds of audio to analyze at one time. Generally, you should choose a clip_length that is similar to the length of the target species vocalization, or a little bit longer. For very slowly pulsing vocalizations, choose a longer window so that at least 5 pulses can occur in one window (0.5 pulses per second -> 10 second window). Typical values for are 0.3 to 10 seconds. Also, clip_overlap can be used for overlap between sequential clips. This is more computationally expensive but will be more likely to center a target sound in the clip (with zero overlap, the target sound may be split up between adjacent clips).
Plot: We can choose to show the power spectrum of pulse repetition rate for each window by setting plot=True. The default is not to show these plots (plot=False).
__ALGORITHM__ This is the procedure RIBBIT follows: divide the audio into segments of length clip_duration for each clip:
calculate time series of energy in signal band (signal_band) and subtract noise band
energies (noise_bands) - calculate power spectral density of the amplitude time series - score the file based on the max value of power spectral density in the pulse rate range
opensoundscape.sample module
Class for holding information on a single sample
- class opensoundscape.sample.AudioSample(source, start_time=None, duration=None, labels=None, trace=None, sample_rate=None)[source]
Bases:
SampleA class containing information about a single audio sample
self.preprocessing_exception is intialized as None and will contain the exception raised during preprocessing if any exception occurs
- property categorical_labels
list of indices with value==1 in self.labels
- property end_time
calculate sample end time as start_time + duration
- classmethod from_series(labels_series, rounding_precision=10, audio_root=None)[source]
initialize AudioSample from a pandas Series (optionally containing labels)
if series name (dataframe index) is tuple, extracts [‘file’,’start_time’,’end_time’]
these values to (source, start_time, duration=end_time-start_time) - otherwise, series name extracted as source; start_time and duration will be none
Extracts source (file), start_time, and end_time from multi-index pd.Series (one row of a pd.DataFrame with multi index [‘file’,’start_time’,’end_time’]). The argument series is saved as self.labels. If sparse, converts to dense. Creates an AudioSample object.
- Parameters:
labels_series – a pd.Series with name = file path or [‘file’,’start_time’,’end_time’] and index as classes with 0/1 values as labels. Labels can have no values (just a name) if sample does not have labels.
rounding_precision – rounds duration to this many decimals to avoid floating point precision errors. Pass None for no rounding. Default: 10 decimal places
audio_root – optionally pass a root directory (pathlib.Path or str) to prepended to each
path (file) –
if None (default), value of file must be full path
opensoundscape.signal_processing module
Signal processing tools for feature extraction and more
- opensoundscape.signal_processing.cwt_peaks(audio, center_frequency, wavelet='morl', peak_threshold=0.2, peak_separation=None, plot=False)[source]
compute a cwt, post-process, then extract peaks
Performs a continuous wavelet transform (cwt) on an audio signal at a single frequency. It then squares, smooths, and normalizes the signal. Finally, it detects peaks in the resulting signal and returns the times and magnitudes of detected peaks. It is used as a feature extractor for Ruffed Grouse drumming detection.
- Parameters:
audio – an Audio object
center_frequency – the target frequency to extract peaks from
wavelet – (str) name of a pywt wavelet, eg ‘morl’ (see pywt docs)
peak_threshold – minimum height of peaks - if None, no minimum peak height - see “height” argument to scipy.signal.find_peaks
peak_separation – minimum time between detected peaks, in seconds - if None, no minimum distance - see “distance” argument to scipy.signal.find_peaks
- Returns:
list of times (from beginning of signal) of each peak peak_levels: list of magnitudes of each detected peak
- Return type:
peak_times
Note
consider downsampling audio to reduce computational cost. Audio must have sample rate of at least 2x target frequency.
- opensoundscape.signal_processing.detect_peak_sequence_cwt(audio, sample_rate=400, window_len=60, center_frequency=50, wavelet='morl', peak_threshold=0.2, peak_separation=0.0375, dt_range=(0.05, 0.8), dy_range=(-0.2, 0), d2y_range=(-0.05, 0.15), max_skip=3, duration_range=(1, 15), points_range=(9, 100), plot=False)[source]
Use a continuous wavelet transform to detect accellerating sequences
This function creates a continuous wavelet transform (cwt) feature and searches for accelerating sequences of peaks in the feature. It was developed to detect Ruffed Grouse drumming events in audio signals. Default parameters are tuned for Ruffed Grouse drumming detection.
Analysis is performed on analysis windows of fixed length without overlap. Detections from each analysis window across the audio file are aggregated.
- Parameters:
audio – Audio object
sample_rate=400 – resample audio to this sample rate (Hz)
window_len=60 – length of analysis window (sec)
center_frequency=50 – target audio frequency of cwt
wavelet='morl' – (str) pywt wavelet name (see pywavelets docs)
peak_threshold=0.2 – height threhsold (0-1) for peaks in normalized signal
peak_separation=15/400 – min separation (sec) for peak finding
dt_range= (0.05, 0.8) – sequence detection point-to-point criterion 1 - Note: the upper limit is also used as sequence termination criterion 2
dy_range= (-0.2, 0) – sequence detection point-to-point criterion 2
d2y_range= (-0.05, 0.15) – sequence detection point-to-point criterion 3
max_skip=3 – sequence termination criterion 1: max sequential invalid points
duration_range= (1, 15) – sequence criterion 1: length (sec) of sequence
points_range= (9, 100) – sequence criterion 2: num points in sequence
plot=False – if True, plot peaks and detected sequences with pyplot
- Returns:
dataframe summarizing detected sequences
Note: for Ruffed Grouse drumming, which is very low pitched, audio is resampled to 400 Hz. This greatly increases the efficiency of the cwt, but will only detect frequencies up to 400/2=200Hz. Generally, choose a resample frequency as low as possible but >=2x the target frequency
Note: the cwt signal is normalized on each analysis window, so changing the analysis window size can change the detection results.
Note: if there is an incomplete window remaining at the end of the audio file, it is discarded (not analyzed).
- opensoundscape.signal_processing.find_accel_sequences(t, dt_range=(0.05, 0.8), dy_range=(-0.2, 0), d2y_range=(-0.05, 0.15), max_skip=3, duration_range=(1, 15), points_range=(5, 100))[source]
detect accelerating/decelerating sequences in time series
developed for deteting Ruffed Grouse drumming events in a series of peaks extracted from cwt signal
The algorithm computes the forward difference of t, y(t). It iterates through the [y(t), t] points searching for sequences of points that meet a set of conditions. It begins with an empty candidate sequence.
“Point-to-point criterea”: Valid ranges for dt, dy, and d2y are checked for each subsequent point and are based on previous points in the candidate sequence. If they are met, the point is added to the candidate sequence.
“Continuation criterea”: Conditions for max_skip and the upper bound of dt are used to determine when a sequence should be terminated.
max_skip: max number of sequential invalid points before terminating
dt<=dt_range[1]: if dt is long, sequence should be broken
“Sequence criterea”: When a sequence is terminated, it is evaluated on conditions for duration_range and points_range. If it meets these conditions, it is saved as a detected sequence.
duration_range: length of sequence in seconds from first to last point
points_range: number of points included in sequence
When a sequence is terminated, the search continues with the next point and an empty sequence.
- Parameters:
t – (list or np.array) times of all detected peaks (seconds)
dt_range= (0.05,0.8) – valid values for t(i) - t(i-1)
dy_range= (-0.2,0) – valid values for change in y (grouse: difference in time between consecutive beats should decrease)
d2y_range= (-.05,.15) – limit change in dy: should not show large decrease (sharp curve downward on y vs t plot)
max_skip=3 – max invalid points between valid points for a sequence (grouse: should not have many noisy points between beats)
duration_range= (1,15) – total duration of sequence (sec)
points_range= (9,100) – total number of points in sequence
- Returns:
lists of t and y for each detected sequence
- Return type:
sequences_t, sequences_y
- opensoundscape.signal_processing.frequency2scale(frequency, wavelet, sample_rate)[source]
determine appropriate wavelet scale for desired center frequency
- Parameters:
frequency – desired center frequency of wavelet in Hz (1/seconds)
wavelet – (str) name of pywt wavelet, eg ‘morl’ for Morlet
sample_rate – sample rate in Hz (1/seconds)
- Returns:
(float) scale parameter for pywt.ctw() to extract desired frequency
- Return type:
scale
Note: this function is not exactly an inverse of pywt.scale2frequency(), because that function returns frequency in sample-units (cycles/sample) rather than frequency in Hz (cycles/second). In other words, freuquency_hz = pywt.scale2frequency(w,scale)*sr.
- opensoundscape.signal_processing.gcc(x, y, cc_filter='phat', frequency_range=None, sample_rate=None, epsilon=0.001)[source]
Generalized cross correlation of two signals
Computes a generalized cross correlation in frequency space.
This implementation also allows restricting the frequency range considered by GCC.
The generalized cross correlation algorithm is described in Knapp and Carter [1].
In the case of cc_filter=’cc’, gcc simplifies to cross correlation and is equivalent to scipy.signal.correlate and numpy.correlate.
code adapted from github.com/axeber01/ngcc
- Parameters:
x – 1d numpy array of audio samples
y – 1d numpy array of audio samples
cc_filter – which filter to use in the gcc. ‘phat’ - Phase transform. Default. ‘roth’ - Roth correlation (1971) ‘scot’ - Smoothed Coherence Transform, ‘ht’ - Hannan and Thomson ‘cc’ - normal cross correlation with no filter ‘cc_norm’ - normal cross correlation normalized by the length and amplitude of the signal
frequency_range –
tuple of (low, high) frequencies to keep in the GCC. If None, all frequencies are kept. first or second value can be None if no lower or upper limit is desired Note: retaining high frequencies near the Nyquist frequency sometimes results
in spurious cross correlation values at 0 or at the beginning/end of the signal when using ‘phat’ and ‘scot’ methods.
sample_rate – sample rate of the signals. Required if using frequency_range.
epsilon – small value used to ensure denominator when applying a filter is non-zero.
- Returns:
1d numpy array of gcc values
- Return type:
gcc
see also: tdoa() uses this function to estimate time delay between two signals
[1] Knapp, C.H. and Carter, G.C (1976) The Generalized Correlation Method for Estimation of Time Delay. IEEE Trans. Acoust. Speech Signal Process, 24, 320-327. http://dx.doi.org/10.1109/TASSP.1976.1162830
- opensoundscape.signal_processing.tdoa(signal, reference_signal, max_delay, cc_filter='phat', sample_rate=1, return_max=False, frequency_range=None)[source]
Estimate time difference of arrival between two signals
estimates time delay by finding the maximum of the generalized cross correlation (gcc) of two signals. The two signals are discrete-time series with the same sample rate.
Only the central portion of the signal, from max_delay after the start and max_delay before the end, is used for the calculation. All of the reference signal is used. This means that tdoa(sig, ref_sig, max_delay) will not necessarily be the same as -tdoa(ref_sig, sig, max_delay
For example, if the signal arrives 2.5 seconds _after_ the reference signal, returns 2.5; if it arrives 0.5 seconds _before_ the reference signal, returns -0.5.
- Parameters:
signal – np.array or list object containing the signal of interest
reference_signal – np.array or list containing the reference signal. Both audio recordings must be time-synchronized.
max_delay – maximum possible tdoa (seconds) between the two signals. Cannot be longer than 1/2 the duration of the signal. The tdoa returned will be between -max_delay and +max_delay. For example, if max_delay=0.5, the tdoa returned will be the delay between -0.5 and +0.5 seconds, that maximizes the cross-correlation. This is useful if you know the maximum possible delay between the two signals, and want to ignore any tdoas outside of that range. e.g. if receivers are 100m apart, and the speed of sound is 340m/s, then the maximum possible delay is 0.294 seconds.
cc_filter – see gcc()
sample_rate – sample rate (Hz) of signals; both signals must have same sample rate
return_max – if True, returns the maximum value of the generalized cross correlation
frequency_range –
tuple of (low, high) frequencies in Hz to keep in the GCC. If None, all frequencies are kept. first or second value can be None if no lower or upper limit is desired Note: retaining high frequencies near the Nyquist frequency sometimes results
in spurious cross correlation values at 0 or at the beginning/end of the signal when using ‘phat’ and ‘scot’ methods.
- Returns:
estimated delay from reference signal to signal, in seconds (note that default samping rate is 1.0 samples/second)
if return_max is True, returns a second value, the maximum value of the result of generalized cross correlation
See also: gcc() if you want the raw output of generalized cross correlation
- opensoundscape.signal_processing.thresholded_event_durations(x, threshold, normalize=False, sample_rate=1)[source]
Detect positions and durations of events over threshold in 1D signal
This function takes a 1D numeric vector and searches for segments that are continuously greater than a threshold value. The input signal can optionally be normalized, and if a sample rate is provided the start positions will be in the units of [sr]^-1 (ie if sr is Hz, start positions will be in seconds).
- Parameters:
x – 1d input signal, a vector of numeric values
threshold – minimum value of signal to be a detection
normalize – if True, performs x=x/max(x)
sample_rate – sample rate of input signal
- Returns:
start time of each detected event durations: duration (# samples/sr) of each detected event
- Return type:
start_times
opensoundscape.spectrogram module
spectrogram.py: Utilities for dealing with spectrograms
- class opensoundscape.spectrogram.MelSpectrogram(spectrogram, frequencies, times, power_spectrogram=None, stft=None, window_samples=None, hop_samples=None, audio_sample_rate=None, fft_size=None)[source]
Bases:
SpectrogramMelSpectrogram class storing mel-frequency spectrogram values and metadata
A mel spectrogram is a spectrogram with pseudo-logarithmically spaced frequency bins rather than linearly spaced bins.
- audio_sample_rate
- fft_size
- frequencies
- classmethod from_audio(audio, window_samples=None, window_length_sec=None, hop_samples=None, overlap_fraction=None, overlap_samples=None, fft_size=None, n_mels=64, f_min=0, f_max=None, norm=None, mel_scale='htk', **kwargs)[source]
Create a MelSpectrogram object from an Audio object
First creates a spectrogram and a mel-frequency filter bank, then computes the dot product of the filter bank with the spectrogram.
A Mel spectrogram is a spectrogram with a quasi-logarithmic frequency axis that has often been used in language processing and other domains.
The kwargs for the mel frequency bank are documented at: - https://librosa.org/doc/latest/generated/librosa.feature.melspectrogram.html#librosa.feature.melspectrogram - https://librosa.org/doc/latest/generated/librosa.filters.mel.html?librosa.filters.mel
- Parameters:
audio – Audio object
window_type="hann" – see scipy.signal.spectrogram docs for description
window_samples – number of audio samples per spectrogram window (pixel) - Defaults to 512 if window_samples and window_length_sec are None - Note: cannot specify both window_samples and window_length_sec
window_length_sec – length of a single window in seconds - Note: cannot specify both window_samples and window_length_sec
hop_samples – number of samples between consecutive windows - Note: specify at most one of (hop_samples, overlap_fraction, overlap_samples)
overlap_fraction – fractional temporal overlap between consecutive windows - Defaults to 0.5 if hop_samples and overlap_fraction are None - Note: specify at most one of (hop_samples, overlap_fraction, overlap_samples)
overlap_samples=None – number of overlapping samples between consecutive windows - Note: specify at most one of (hop_samples, overlap_fraction, overlap_samples)
- :paramnumber of overlapping samples between consecutive windows
Note: specify at most one of (hop_samples, overlap_fraction, overlap_samples)
- Parameters:
fft_size – number of fft points, see torchaudio.transforms.Spectrogram If None, defaults to window_samples
n_mels – Number of mel bands to generate [default: 128] Note: n_mels should be chosen for compatibility with the Spectrogram parameter window_samples. Choosing a value > ~ window_samples/10 will result in zero-valued rows while small values blend rows from the original spectrogram.
power – Exponent for the magnitude spectrogram [default: 2]
f_min – lowest frequency (in Hz) represented in the mel spectrogram [default: 0]
f_max – highest frequency (in Hz) represented in the mel spectrogram [default: sr/2]
norm – if ‘slaney’, divide the triangular mel weights by the width of the mel band (area normalization). None otherwise. See librosa or torchaudio docs. [default: None]
mel_scale – ‘htk’ [default] or ‘slaney’ slaney: use Slaney-style mel-filter bank htk: use HTK-style mel-filter bank
- Returns:
opensoundscape.spectrogram.MelSpectrogram object
- hop_samples
- plot(ax=None, show_colorbar=False, range=(-80, 0), kHz=False, n_freq_ticks=5)[source]
Plot a spectrogram (e.g., mel) with evenly sized pixels and evenly spaced y-ticks, where tick labels show the corresponding nonlinear frequency values.
- Parameters:
ax (matplotlib.axes.Axes, optional) – Axis to plot on.
show_colorbar (bool) – Include a colorbar.
range (tuple) – (min, max) dB range for color normalization.
kHz (bool) – Plot y-axis in kHz instead of Hz.
n_freq_ticks (int) – number of y-axis ticks to display
- Returns:
matplotlib.axes.Axes
- power_spectrogram
- times
- window_samples
- class opensoundscape.spectrogram.Spectrogram(spectrogram, frequencies, times, power_spectrogram=None, stft=None, window_samples=None, hop_samples=None, audio_sample_rate=None, fft_size=None)[source]
Bases:
objectClass storing spectrogram values and metadata
Can be initialized directly from spectrogram, frequency, and time values or created from an Audio object using the .from_audio() method.
Initializing with pattern Spectrogram(spectrogram,frequencies,times) expects the values passed to be the decibel-valued power spectrogram. To initialize from the linear-valued power spectrogram or the raw STFT values, pass the power_spectrogram or magnitude arguments to the constructor, and pass spectrogram=None. Regardless of which input method is used, the stored attribute self.power_spectrogram will always be the linear-valued power spectrogram, and self.spectrogram property will return the decibel-valued power spectrogram.
Note: the spectrogram is defined as the absolute square of the short-time Fourier transform (STFT) of the audio signal [1]. The attribute self.power_spectrogram will always be the linear-valued power spectrogram (i.e. abs(stft)**2). You can also retrieve the magnitude of the STFT via the self.magnitude property (which takes the square root of self.spectrogram), or the decibel-valued power via self.spectrogram (which computes 10*log10(self.power_spectrogram)).
- power_spectrogram
(np.ndarray) 2d array of power spectrogram (not dB-scaled)
- frequencies
(list) discrete frequency bins generated by fft
- times
(list) time from beginning of file to the center of each window
- window_samples
number of samples per window when spec was created [default: None]
- overlap_samples
number of samples overlapped in consecutive windows when spec was created [default: None]
- window_type
window fn used to make spectrogram, eg ‘hann’ [default: None]
- audio_sample_rate
sample rate of audio from which spec was created [default: None]
- Properties:
spectrogram: returns the dB-valued power spectrogram magnitude: returns the linear-valued STFT magnitude (sqrt(self.power_spectrogram)) shape: returns the shape of the spectrogram array as (n_frequencies, n_times) duration: returns the duration of the audio signal in seconds window_length_seconds: length of a single fft window, in seconds window_hop_seconds: time difference (sec) between consecutive windows’ centers
[1] Karlheinz Gröchenig: “Foundations of Time-Frequency Analysis”, Birkhäuser Boston 2001, DOI:10.1007/978-1-4612-0003-1
- audio_sample_rate
- bandpass(min_f, max_f, out_of_bounds_ok=True)[source]
extract a frequency band from a spectrogram
crops the 2-d array of the spectrograms to the desired frequency range by removing rows.
Lowest and highest row kept are those with frequencies closest to min_f and max_f
- Parameters:
min_f – low frequency in Hz for bandpass
max_f – high frequency in Hz for bandpass
out_of_bounds_ok – (bool) if False, raises ValueError if min_f or max_f are not within the range of the original spectrogram’s frequencies [default: True]
- Returns:
bandpassed spectrogram object
- property duration
returns the duration of the audio signal in seconds
Note: can be shorter than the audio signal it was created from, depending on the window and hop sizes used to create the spectrogram.
- fft_size
- frequencies
- classmethod from_audio(audio, window_samples=None, window_length_sec=None, hop_samples=None, overlap_fraction=None, overlap_samples=None, fft_size=None, **kwargs)[source]
create a Spectrogram object from an Audio object
Note: the spectrogram is defined as the absolute square of the short-time Fourier transform (STFT) of the audio signal [1]. The attribute self.power_spectrogram will always be the linear-valued power spectrogram (i.e. abs(stft)**2). You can also retrieve the magnitude of the STFT via the self.magnitude property (which takes the square root of self.power_spectrogram), or the decibel-valued power via self.spectrogram (which computes 10*log10(self.power_spectrogram)).
- Parameters:
audio – Audio object
window_samples – number of audio samples per spectrogram window (pixel) - Defaults to 512 if window_samples and window_length_sec are None - Note: cannot specify both window_samples and window_length_sec
window_length_sec –
length of a single window in seconds - Note: cannot specify both window_samples and window_length_sec - Warning: specifying this parameter often results in less efficient
spectrogram computation because fft_size will not be an optimal value.
hop_samples – number of samples between the start of consecutive windows - Note: specify at most one of (hop_samples, overlap_fraction, overlap_samples)
overlap_fraction – fractional temporal overlap between consecutive windows - Defaults to 0.5 if hop_samples, overlap_fraction, and overlap_samples are None - Note: specify at most one of (hop_samples, overlap_fraction, overlap_samples)
overlap_samples – number of samples overlapped in consecutive windows - Note: specify at most one of (hop_samples, overlap_fraction, overlap_samples)
fft_size – number of fft points, see torchaudio.transforms.Spectrogram If None, defaults to window_samples
**kwargs – kwargs are passed to torchaudio.transforms.Spectrogram
- Returns:
opensoundscape.spectrogram.Spectrogram object
### Notes on spectrogram creation: We use torchaudio.transforms.Spectrogram to create the STFT power spectrogram, then normalize by window length to preserve time-domain signal norm (ie similar values regardless of window size parameter). This formulation is equivalent to librosa.magphase(librosa.stft(x,…),power=2)/window_length. Scipy returns spec/window_length, while librosa and torchaudio do not normalize by the window length. A result equivalent to librosa or torchaudio can be obtained via self.power_spectrogram * self.window_samples. Scipy also detrends frame-by-frame by default, resulting in ~0 energy in the 0 Hz bin.
### Notes on recovering rms amplitude from spectrogram: To recover a windowed signal RMS from a spectrogram, use window_fn=torch.ones then use the self.rms property to calculate windowed RMS amplitude.
`python spec = Spectrogram.from_audio(audio, window_fn=torch.ones) rms = spec.rms # similar to: # S=librosa.magphase(librosa.stft(audio.samples, window=np.ones, center=False, # win_length=w,n_fft=w,hop_length=w//2))[0] # rms = librosa.feature.rms(S=S, frame_length=w) `[1] Karlheinz Gröchenig: “Foundations of Time-Frequency Analysis”, Birkhäuser Boston 2001, DOI:10.1007/978-1-4612-0003-1
- hop_samples
- limit_range(min=-100, max=-20)[source]
Limit (clip) the values of the spectrogram to range from min to max
values of self.spectrogram less than min are set to min values of self.spectrogram greater than max are set to max
similar to Audacity’s gain and range parameters
- Parameters:
min – values lower than this are set to this
max – values higher than this are set to this
- Returns:
Spectrogram object with .spectrogram values clipped to (min,max)
- linear_scale(feature_range=(0, 1), input_range=(-100, -20))[source]
Linearly rescale spectrogram values to a range of values
- Parameters:
feature_range – tuple of (low,high) values for output
input_range – tuple of (min,max) range. values beyond this range will be clipped to (low,high) before mapping onto the feature_range
- Returns:
Spectrogram object with values rescaled to feature_range
- property magnitude
returns the linear-valued STFT magnitude (sqrt(self.power_spectrogram))
- min_max_scale(feature_range=(0, 1))[source]
Linearly rescale spectrogram values to a range of values using in_range as minimum and maximum
- Parameters:
feature_range – tuple of (low,high) values for output
- Returns:
Spectrogram object with values rescaled to feature_range
- net_amplitude(signal_band, reject_bands=None)[source]
create RMS amplitude signal in signal_band and subtract amplitude from reject_bands
rescale the signal and reject bands by dividing by their bandwidths in Hz (amplitude of each reject_band is divided by the total bandwidth of all reject_bands. amplitude of signal_band is divided by bandwidth of signal_band. )
- Parameters:
signal_band – [low,high] frequency range in Hz (positive contribution)
band (reject) – list of [low,high] frequency ranges in Hz (negative contribution)
return: time-series array of net amplitude
- pcen(**kwargs)[source]
apply per-channel energy normalization (PCEN) to spectrogram, return 2d numpy array
see: https://librosa.org/doc/latest/generated/librosa.pcen.html#librosa.pcen
- Parameters:
**kwargs – keyword arguments passed to librosa.pcen()
time_constant (including)
gain
bias
power
eps
b
- Returns:
2d numpy array with PCEN applied to self.power_spectrogram
- plot(ax=None, show_colorbar=False, range=(-80, 0), kHz=False, cmap='Greys', dB=True)[source]
Plot a spectrogram (e.g., mel) with evenly sized pixels and evenly spaced y-ticks, where tick labels show the corresponding nonlinear frequency values.
- Parameters:
ax (matplotlib.axes.Axes, optional) – Axis to plot on.
show_colorbar (bool) – Include a colorbar.
range (tuple) – (min, max) dB range for color normalization.
kHz (bool) – Plot y-axis in kHz instead of Hz.
cmap (str) – Colormap to use for the spectrogram.
dB (bool) – If True, plot self.spectrogram (dB values). If False, plot self.power_spectrogram.
- Returns:
matplotlib.axes.Axes
- power_spectrogram
- property rms
calculate time-windowed amplitude ie ~rms~ (see NOTE for caveats)
NOTE: the computed values will only be equivalent to the rms of the original signal if the spectrogram was created with a rectangular window, e.g.
`python Spectrogram.from_audio(audio, window_fn=torch.ones).rms `(see https://github.com/librosa/librosa/issues/1795 for details)
- Returns:
time-windowed amplitude measurement
- Return type:
np.ndarray
- property shape
returns the shape of the spectrogram array as (n_frequencies, n_times)
- property spectrogram
returns the dB-valued power spectrogram
- times
- to_image(shape=None, channels=1, colormap=None, invert=False, return_type='pil', range=(-80, 0), dB=True)[source]
Create an image from spectrogram (array, tensor, or PIL.Image)
Note: Linearly rescales values in the spectrogram from range (min,max) to [0,255] (PIL.Image) or [0,1] (array/tensor)
Default of range is [-80, 0], so, e.g., 0 db is loudest -> black, -80 db is quietest -> white
- Parameters:
shape – tuple of output dimensions as (height, width) - if None, retains original shape of self.spectrogram - if first or second value are None, retains original shape in that dimension
channels – eg 3 for rgb, 1 for greyscale - must be 3 to use colormap
colormap – if None, greyscale spectrogram is generated Can be any matplotlib colormap name such as ‘jet’
invert – if True, inverts colors (eg black->white, white->black) via 1-x [default: False]
return_type – type of returned object - [default] ‘pil’: PIL.Image - ‘np’: numpy.ndarray - ‘torch’: torch.tensor
range – tuple of (min,max) values of .spectrogram to map to the lowest/highest pixel values. Values outside this range will be clipped to the min/max values
dB – if True, use self.spectrogram (dB values) for scaling. If False, use self.power_spectrogram (linear values). [default: True]
- Returns:
- PIL.Image with c channels and shape w,h given by shape
and values in [0,255]
np.ndarray with shape [c,h,w] and values in [0,1]
or torch.tensor with shape [c,h,w] and values in [0,1]
- Return type:
Image/array with type depending on return_type
- trim(start_time, end_time)[source]
extract a time segment from a spectrogram
first and last columns kept are those with times closest to start_time and end_time
- Parameters:
start_time – in seconds
end_time – in seconds
- Returns:
spectrogram object from extracted time segment
- property window_hop_seconds
calculate time difference (sec) between consecutive windows’ centers
- property window_length_seconds
calculate length of a single fft window, in seconds:
- window_samples
- opensoundscape.spectrogram.plot_spectrograms(specs, n_col=3, value_range=(-80, 0), frequency_range=None, titles=None)[source]
plot a grid of spectrograms
- Parameters:
specs – list of Spectrogram objects
n_col – number of columns in the plot grid
value_range – (min, max) value range for plotting
titles – optional list of titles for each subplot
- Returns:
matplotlib figure and axes objects
- Return type:
fig, axs
- opensoundscape.spectrogram.plot_spectrograms_from_audio(audio_clips, n_col=3, value_range=(-80, 0), frequency_range=None, titles=None, **kwargs)[source]
create spectrograms from audio clips and plot them in a grid
- Parameters:
audio_clips – list of Audio objects
n_col – number of columns in the plot grid
value_range – (min, max) value range for plotting
titles – optional list of titles for each subplot
**kwargs – kwargs passed to Spectrogram.from_audio
- Returns:
matplotlib figure and axes objects
- Return type:
fig, axs
opensoundscape.utils module
Utilities for opensoundscape
- exception opensoundscape.utils.GetDurationError[source]
Bases:
ValueErrorraised if librosa.get_duration(path=f) causes an error
- opensoundscape.utils.cast_np_to_native(x)[source]
if the input is a numpy integer or floating type, cast to native Python int or float
otherwise, input is unaffected
- opensoundscape.utils.filename_first_part(file_path)[source]
Utility function to extract the part of the filename before the first underscore from a file path
- opensoundscape.utils.generate_clip_times_df(full_duration, clip_duration, clip_overlap=None, overlap_fraction=None, clip_step=None, final_clip='extend', rounding_precision=10)[source]
generate start and end times for even-lengthed clips
The behavior for incomplete final clips at the end of the full_duration depends on the final_clip parameter.
This function only creates a dataframe with start and end times, it does not perform any actual trimming of audio or other objects.
- Parameters:
full_duration – The amount of time (seconds) to split into clips
clip_duration (float) – The duration in seconds of the clips
clip_overlap (float) – The overlap of the clips in seconds
overlap_fraction (float) – The overlap of the clips as a fraction of clip_duration
clip_step (float) – The increment in seconds between starts of consecutive clips - must only specify one of clip_overlap, overlap_fraction, or clip_step - if all are None, overlap is set to 0
final_clip (str) –
Behavior if final_clip is less than clip_duration seconds long. By default, discards remaining time if less than clip_duration seconds long [default: None]. Options:
None: Discard the remainder (do not make a clip)
”extend”: Extend the final clip beyond full_duration with zeros to reach clip_duration length
”remainder”: Use only remainder of full_duration (final clip will be shorter than clip_duration)
”full”: Increase overlap with previous clip to yield a clip with clip_duration length. Note: returns entire original audio if it is shorter than clip_duration
rounding_precision (int or None) – number of decimals to round start/end times to - pass None to skip rounding
- Returns:
DataFrame with columns for ‘start_time’ and ‘end_time’ of each clip
- Return type:
clip_df
- opensoundscape.utils.generate_opacity_colormaps(colors=['#067bc2', '#43a43d', '#ecc30b', '#f37748', '#d56062'])[source]
Create a colormap for each color from transparent to opaque
- opensoundscape.utils.jitter(x, width, distribution='gaussian')[source]
Jitter (add random noise to) each value of x
- Parameters:
x – scalar, array, or nd-array of numeric type
width – multiplier for random variable (stdev for ‘gaussian’ or r for ‘uniform’)
distribution – ‘gaussian’ (default) or ‘uniform’ if ‘gaussian’: draw jitter from gaussian with mu = 0, std = width if ‘uniform’: draw jitter from uniform on [-width, width]
- Returns:
x + random jitter
- Return type:
jittered_x
- opensoundscape.utils.linear_scale(array, in_range=(0, 1), out_range=(0, 255))[source]
Translate from range in_range to out_range
- Inputs:
in_range: The starting range [default: (0, 1)] out_range: The output range [default: (0, 255)]
- Outputs:
new_array: A translated array
- opensoundscape.utils.make_clip_df(files, clip_duration, clip_overlap=None, overlap_fraction=None, clip_step=None, final_clip='extend', return_invalid_samples=False, raise_exceptions=False, audio_root=None)[source]
generate df of fixed-length clip start/end times for a set of files
Used internally to prepare a dataframe listing clips of longer audio files
This function creates a single dataframe with audio files as the index and columns: ‘start_time’, ‘end_time’. It will list clips of a fixed duration from the beginning to end of each audio file.
Note: if a label dataframe is passed as files, the labels for each file will be copied to all clips having the corresponding file. If the label dataframe contains multiple rows for a single file, the labels in the _first_ row containing the file path are used as labels for resulting clips.
- Parameters:
files – list of audio file paths, or dataframe with file path as index - if dataframe, columns represent classes and values represent class labels. Labels for a file will be copied to all clips belonging to that file in the returned clip dataframe.
clip_duration (float) – see generate_clip_times_df
clip_overlap (float) – see generate_clip_times_df
overlap_fraction (float) – see generate_clip_times_df
clip_step (float) – see generate_clip_times_df
final_clip (str) – see generate_clip_times_df
return_invalid_samples (bool) – if True, returns additional value, a list of samples that caused exceptions
raise_exceptions (bool) – if True, if exceptions are raised when attempting to check the duration of an audio file, the exception will be raised. If False [default], adds a row to the dataframe with np.nan for ‘start_time’ and ‘end_time’ for that file path.
audio_root – optionally pass a root directory (pathlib.Path or str) - audio_root is prepended to each file path - if None (default), files must contain full paths to files
- Returns:
- dataframe multi-index (‘file’,’start_time’,’end_time’)
if files is a dataframe, will contain same columns as files
otherwise, will have no columns
if return_invalid_samples==True, returns (clip_df, invalid_samples)
- Return type:
clip_df
- Note: default behavior for raise_exceptions is the following:
if an exception is raised (for instance, trying to get the duration of the file), the dataframe will have one row with np.nan for ‘start_time’ and ‘end_time’ for that file path.
- opensoundscape.utils.min_max_scale(array, feature_range=(0, 1))[source]
rescale vaues in an a array linearly to feature_range
- opensoundscape.utils.overlap(r1, r2)[source]
“calculate the amount of overlap between two real-numbered ranges
ranges must be [low,high] where low <= high
- opensoundscape.utils.overlap_fraction(r1, r2)[source]
“calculate the fraction of r1 (low, high range) that overlaps with r2
- opensoundscape.utils.parent_folder_name(file_path)[source]
Utility function to extract the parent folder name from a file path
- opensoundscape.utils.rescale_features(X, rescaling_vector=None)[source]
rescale all features by dividing by the max value for each feature
optionally provide the rescaling vector (1xlen(X) np.array), so that you can rescale a new dataset consistently with an old one
returns rescaled feature set and rescaling vector
- opensoundscape.utils.second_parent_name(file_path)[source]
Utility function to extract the second parent folder name from a file path
opensoundscape.vector_database module
utilities for integrating with HopLite vector database
(possibly other vector database libraries in the future)
- opensoundscape.vector_database.find_matching_windows(db, date_range=None, time_range=None, deployments=None, projects=None, recordings=None, deployments_filter=None, recordings_filter=None, windows_filter=None, annotations_filter=None)[source]
Match database windows based on filters for date, time, deployment, project, recording, and annotations
- Parameters:
db – hoplite database containing embeddings
date_range – tuple of (start_date, end_date) to filter clips by date; Formats: datetime.datetime, datetime.date, or string in “YYYY-MM-DD” format; if None, does not filter by date Can pass (date,None) or (None,date) to filter by only start or end date, respectively
time_range – tuple of (start_time, end_time) to filter clips by time of day; if None, does not filter by time of day Formats: datetime.datetime, datetime.time or string in “HH:MM:SS” format Note: filters by time of day of the _recording_ start time (rather than audio clip start time) Assumes time zone match between time_range values and recording timestamps in the database
deployments – list of deployment names to filter by; if None, does not filter by deployment
projects – list of project names to filter by; if None, does not filter by project
recordings – list of recording names to filter by; if None, does not filter by recording
deployments_filter – custom filter dict for deployments; if provided, overrides deployments argument
recordings_filter – custom filter dict for recordings; if provided, overrides recordings argument
windows_filter – custom filter dict for windows; if provided, overrides date_range, time_range arguments
annotations_filter – custom filter dict for annotations in hoplite DB
- opensoundscape.vector_database.get_existing_windows(db, files, deployment_id=None, deployment_name=None, project=None)[source]
retrieve db windows for list of files, filtering by deployment/project
- opensoundscape.vector_database.load_or_create_hoplite_usearch_db(db, embedding_dim=None, cfg=None)[source]
helper function to load or create a hoplite database object
- Parameters:
db –
a hoplite database object or a path to a hoplite database (folder) - if a path is provided, the database will be created if it does not exist,
and passing embedding_dim is required in this case if it does exist, it will be loaded
if a hoplite database object is provided, it will be returned as is
embedding_dim – int, dimension of the embeddings to be stored in the database - only required when creating a new database
cfg – optional config_dict.ConfigDict object with usearch configuration - only used when creating a new database - if None, default usearch config will be used Keys: ‘embedding_dim’, ‘dtype’, ‘metric_name’, ‘expansion_add’, ‘expansion_search’ Example (default values):
`python usearch_cfg = config_dict.ConfigDict() usearch_cfg.embedding_dim = embedding_dim usearch_cfg.dtype = 'float16' usearch_cfg.metric_name = 'IP' usearch_cfg.expansion_add = 256 usearch_cfg.expansion_search = 128 `
- Returns:
a hoplite database object
- opensoundscape.vector_database.normalize_index_to_tuples(idx, rounding_precision=6)[source]
normalize an index of (filename, start_time, end_time) tuples to account for potential float precision issues and Path vs str differences
- opensoundscape.vector_database.normalize_windows_to_tuples(windows, rounding_precision=6)[source]
helper function to convert list of hoplite windows to list of (filename, start_time, end_time) tuples
- opensoundscape.vector_database.remove_duplicate_windows(db)[source]
utility function to remove duplicate windows from a hoplite database
duplicate = same (recording, start_time, end_time)
- Parameters:
db – a hoplite database object
- opensoundscape.vector_database.similarity_search_hoplite_db(query_embedding, db, num_results=5, exact_search=False, search_subset_size=None, target_score=None, search_kwargs=None)[source]
Perform a similarity search in the Hoplite database.
- Parameters:
query_embedding – np.ndarray of shape (embedding_dim,) representing the embedding of the query audio clip
db – a Hoplite database containing embeddings from the same model
num_results – The number of results to return for each query
exact_search – default False for usearch (faster), if True uses brute force search
search_subset_size – Number of embeddings to compare with. If None, all embeddings are used. For floats between 0 and 1, sample a proportion of the database. For ints, sample the specified number of embeddings. if None [default], searches all embeddings Note: only implemented for exact_search=True
target_score – if specified, searches for similarity scores close to target_score default [None] searches for most similar embeddings
audio_root – root directory for relative paths to query audio files
search_kwargs – dict of additional keyword arguments passed to db.ui.search() or brutalism.threaded_brute_search() if exact_search=True exact_search=False: radius, threads, exact, log, progress exact_search=True: batch_size, max_workers, rng_seed
**embedding_kwargs – additional keyword arguments passed to self.embed(), such as batch_size and num_workers
- Returns:
- Each item is a dictionary with the following keys:
”query”: dictionary with query metadata
”results”: list of dictionaries with metadata for each retrieved sample
- Return type:
A list of dictionaries with the search results, one item per query sample
- opensoundscape.vector_database.windows_to_dataframe(windows, extra_keys=None)[source]
convert list of hoplite windows to a pandas dataframe with relevant info for each window
- Parameters:
windows – list of hoplite window objects, with attributes filename, offsets, datetime, deployment, project, id
extra_keys – optional list of additional attributes to include in the dataframe, if present in the window objects
- Returns:
pandas dataframe with columns for file, start_time, end_time, datetime, deployment, project, window_id, and any extra_keys specified
opensoundscape.visualization module
- opensoundscape.visualization.annotate(clip_df, indices=None, annotation_buttons=None, dur=None, N=20, bandpass_range=None, dB_range=[-50, 0], cmap='Greys', cell_width=250, cell_height=125, apply_noise_reduction=False, normalize_audio=True, spec_kwargs=None)[source]
Display an interactive grid of spectrograms with annotation toggle buttons.
Each clip is shown as a spectrogram with click-to-play audio. If
annotation_buttonsis provided, toggle buttons appear below each clip. Activating a button sets the value toTrue; deactivating it sets the value toNone. Annotations are written in-place toannotations_dfif provided, otherwise toclip_df.Optionally pass indices to subset the dataframe to rows to select from
- Parameters:
clip_df (pd.DataFrame) – DataFrame with columns ‘file’, ‘start_time’, ‘end_time’. Used to load and display audio clips.
indices (list, optional) – indices of clip_df to subset to before selecting clips for display
annotation_buttons (list[str], optional) – Labels for annotation toggle buttons displayed below each clip.
dur (float, optional) – Duration of each audio clip in seconds. If None, uses
end_time - start_timefor each row.N (int) – Maximum number of clips to display (randomly sampled).
bandpass_range (tuple, optional) –
(min_freq, max_freq)for bandpass filtering the spectrogram.dB_range (list) –
[min_dB, max_dB]for clipping spectrogram values.cmap (str) – Matplotlib colormap name.
cell_width (int) – Minimum width of each grid cell in pixels.
cell_height (int) – Height of each spectrogram image in pixels.
apply_noise_reduction (bool) – if True uses noisereduce on audio clips with default params
normalize_audio (bool) – if True, normalizes audio clips to peak=1.0
spec_kwargs (dict or None) – keyword arguments to Spectrogram.from_audio()
- Returns:
The displayed widget container.
- Return type:
ipywidgets.GridBox
- opensoundscape.visualization.explore_features(df, x_col='x', y_col='y', color_col=None, symbol_col=None, size_col=None, hover_name_col=None, **inspect_kwargs)[source]
- opensoundscape.visualization.explore_histogram(df, value_col, label_col=None, bins=30, **inspect_kwargs)[source]
Interactive histogram for exploring feature distributions.
Displays one overlaid histogram trace per unique value in
label_col(or a single trace iflabel_colis None). Each label gets a toggle button to show/hide its histogram trace and to include/exclude it from the “Inspect random selection” sample.- Parameters:
df (pd.DataFrame) – DataFrame with columns including
value_coland (optionally)label_col, plus ‘file’, ‘start_time’, ‘end_time’ for the inspect callback.value_col (str) – Column name for numeric values to histogram.
label_col (str, optional) – Column name for categorical labels. If None, all data is shown as a single histogram.
bins (int) – Number of histogram bins.
**inspect_kwargs – Additional keyword arguments passed to
inspect().
- Returns:
- Container widget with histogram and
controls.
fw (plotly.graph_objects.FigureWidget): The Plotly figure widget.
- Return type:
container (ipywidgets.VBox)
- opensoundscape.visualization.inspect(clip_df, dur=None, N=20, bandpass_range=None, dB_range=[-100, -20], cmap='Greys', normalize_audio=False, apply_noise_reduction=False, cell_width=250, cell_height=125, display_inline=True)[source]
Display an interactive grid of spectrograms with click-to-play audio.
- Parameters:
clip_df (pd.DataFrame) – DataFrame with columns (or multi-index) ‘file’, ‘start_time’, (optional ‘end_time’)
dur (float, optional) – Duration of audio clips in seconds. If None, uses end_time - start_time. Note: if dur is specified but end_time is not present, will center the clip on start_time. If dur is None, requires end_time column to determine clip duration.
N (int) – Number of samples to display (randomly selected if more are available).
bandpass_range (tuple, optional) – Frequency range (min_freq, max_freq) for bandpass filtering.
dB_range (list) – [min_dB, max_dB] for spectrogram clipping.
cmap (str) – Matplotlib colormap for spectrograms.
normalize_audio (bool) – Whether to normalize audio clips.
apply_noise_reduction (bool) – Whether to apply noise reduction to audio clips.
cell_width (int) – Width of each cell in the grid (in pixels).
cell_height (int) – Height of each cell in the grid (in pixels).
display_inline (bool) – Whether to display the HTML output immediately.
- Returns:
HTML object with the interactive grid.
opensoundscape.vector_database module
utilities for integrating with HopLite vector database
(possibly other vector database libraries in the future)
- opensoundscape.vector_database.find_matching_windows(db, date_range=None, time_range=None, deployments=None, projects=None, recordings=None, deployments_filter=None, recordings_filter=None, windows_filter=None, annotations_filter=None)[source]
Match database windows based on filters for date, time, deployment, project, recording, and annotations
- Parameters:
db – hoplite database containing embeddings
date_range – tuple of (start_date, end_date) to filter clips by date; Formats: datetime.datetime, datetime.date, or string in “YYYY-MM-DD” format; if None, does not filter by date Can pass (date,None) or (None,date) to filter by only start or end date, respectively
time_range – tuple of (start_time, end_time) to filter clips by time of day; if None, does not filter by time of day Formats: datetime.datetime, datetime.time or string in “HH:MM:SS” format Note: filters by time of day of the _recording_ start time (rather than audio clip start time) Assumes time zone match between time_range values and recording timestamps in the database
deployments – list of deployment names to filter by; if None, does not filter by deployment
projects – list of project names to filter by; if None, does not filter by project
recordings – list of recording names to filter by; if None, does not filter by recording
deployments_filter – custom filter dict for deployments; if provided, overrides deployments argument
recordings_filter – custom filter dict for recordings; if provided, overrides recordings argument
windows_filter – custom filter dict for windows; if provided, overrides date_range, time_range arguments
annotations_filter – custom filter dict for annotations in hoplite DB
- opensoundscape.vector_database.get_existing_windows(db, files, deployment_id=None, deployment_name=None, project=None)[source]
retrieve db windows for list of files, filtering by deployment/project
- opensoundscape.vector_database.load_or_create_hoplite_usearch_db(db, embedding_dim=None, cfg=None)[source]
helper function to load or create a hoplite database object
- Parameters:
db –
a hoplite database object or a path to a hoplite database (folder) - if a path is provided, the database will be created if it does not exist,
and passing embedding_dim is required in this case if it does exist, it will be loaded
if a hoplite database object is provided, it will be returned as is
embedding_dim – int, dimension of the embeddings to be stored in the database - only required when creating a new database
cfg – optional config_dict.ConfigDict object with usearch configuration - only used when creating a new database - if None, default usearch config will be used Keys: ‘embedding_dim’, ‘dtype’, ‘metric_name’, ‘expansion_add’, ‘expansion_search’ Example (default values):
`python usearch_cfg = config_dict.ConfigDict() usearch_cfg.embedding_dim = embedding_dim usearch_cfg.dtype = 'float16' usearch_cfg.metric_name = 'IP' usearch_cfg.expansion_add = 256 usearch_cfg.expansion_search = 128 `
- Returns:
a hoplite database object
- opensoundscape.vector_database.normalize_index_to_tuples(idx, rounding_precision=6)[source]
normalize an index of (filename, start_time, end_time) tuples to account for potential float precision issues and Path vs str differences
- opensoundscape.vector_database.normalize_windows_to_tuples(windows, rounding_precision=6)[source]
helper function to convert list of hoplite windows to list of (filename, start_time, end_time) tuples
- opensoundscape.vector_database.remove_duplicate_windows(db)[source]
utility function to remove duplicate windows from a hoplite database
duplicate = same (recording, start_time, end_time)
- Parameters:
db – a hoplite database object
- opensoundscape.vector_database.similarity_search_hoplite_db(query_embedding, db, num_results=5, exact_search=False, search_subset_size=None, target_score=None, search_kwargs=None)[source]
Perform a similarity search in the Hoplite database.
- Parameters:
query_embedding – np.ndarray of shape (embedding_dim,) representing the embedding of the query audio clip
db – a Hoplite database containing embeddings from the same model
num_results – The number of results to return for each query
exact_search – default False for usearch (faster), if True uses brute force search
search_subset_size – Number of embeddings to compare with. If None, all embeddings are used. For floats between 0 and 1, sample a proportion of the database. For ints, sample the specified number of embeddings. if None [default], searches all embeddings Note: only implemented for exact_search=True
target_score – if specified, searches for similarity scores close to target_score default [None] searches for most similar embeddings
audio_root – root directory for relative paths to query audio files
search_kwargs – dict of additional keyword arguments passed to db.ui.search() or brutalism.threaded_brute_search() if exact_search=True exact_search=False: radius, threads, exact, log, progress exact_search=True: batch_size, max_workers, rng_seed
**embedding_kwargs – additional keyword arguments passed to self.embed(), such as batch_size and num_workers
- Returns:
- Each item is a dictionary with the following keys:
”query”: dictionary with query metadata
”results”: list of dictionaries with metadata for each retrieved sample
- Return type:
A list of dictionaries with the search results, one item per query sample
- opensoundscape.vector_database.windows_to_dataframe(windows, extra_keys=None)[source]
convert list of hoplite windows to a pandas dataframe with relevant info for each window
- Parameters:
windows – list of hoplite window objects, with attributes filename, offsets, datetime, deployment, project, id
extra_keys – optional list of additional attributes to include in the dataframe, if present in the window objects
- Returns:
pandas dataframe with columns for file, start_time, end_time, datetime, deployment, project, window_id, and any extra_keys specified