# Annotations¶

## Raven¶

raven.py: Utilities for dealing with Raven files

opensoundscape.raven.annotation_check(directory, col)

Check that rows of Raven annotations files contain class labels

Parameters: directory – The path which contains Raven annotations file(s) col – Name of column containing annotations None
opensoundscape.raven.generate_class_corrections(directory, col)

Generate a CSV to specify any class overrides

Parameters: directory – The path which contains lowercase Raven annotations file(s) col – Name of column containing annotations A multiline string containing a CSV file with two columns raw and corrected csv (string)
opensoundscape.raven.generate_split_labels_file(directory, col, split_len_s, total_len_s=None, species=None, out_csv=None)

Generate binary labels for a directory of Raven annotations

Given a directory of lowercase Raven annotations, splits the annotations into segments that can be used as labels for machine learning programs that only take short segments.

Parameters: directory – The path which contains lowercase Raven annotations file(s) col (str) – name of column in Raven file to look for annotations in split_len_s (int) – length of segments to break annotations into (e.g. for 5s: 5) total_len_s (float) – length of original files (e.g. for 5-minute file: 300). If not provided, estimates length individually for each file based on end time of last annotation [default: None] species (str, list, or None) – species or list of species annotations to look for [default: None] out_csv (str) (optional) – None] split file of the format filename, start_seg, end_seg, species1, species2, …, speciesN orig/fname1, 0, 5, 0, 1, …, 1 orig/fname1, 5, 10, 0, 0, …, 1 orig/fname2, 0, 5, 1, 1, …, 1 … saves all_selections to out_csv if this is specified all_selections (pd.DataFrame)
opensoundscape.raven.get_labels_in_dataset(selections_files, col)

Get list of all labels in selections_files

Parameters: selections_files (list) – list of Raven selections.txt files col (str) – the name of the column containing the labels a list of the unique values found in the label column of this dataset
opensoundscape.raven.lowercase_annotations(directory, out_dir=None)

Convert Raven annotation files to lowercase and save

Parameters: directory – The path which contains Raven annotations file(s) out_dir – The path at which to save (default: save in directory, same location as annotations) [default: None] None
opensoundscape.raven.query_annotations(directory, cls, col, print_out=False)

Given a directory of Raven annotations, query for a specific class

Parameters: directory – The path which contains lowercase Raven annotations file(s) cls – The class which you would like to query for col – Name of column containing annotations print_out – Format of output. If True, output contains delimiters. If False, returns output [default: False] A multiline string containing annotation file and rows matching the query cls output (string)
opensoundscape.raven.raven_audio_split_and_save(raven_directory, audio_directory, destination, col, sample_rate, clip_duration, clip_overlap=0, final_clip=None, extensions=['wav', 'WAV', 'mp3'], csv_name='labels.csv', labeled_clips_only=False, min_label_len=0, species=None, dry_run=False, verbose=False)

Split audio and annotations files simultaneously

Splits audio into short clips with the desired overlap. Saves these clips and a one-hot encoded labels CSV into the directory of choice. Labels for csv are selected based on all labels in clips.

Requires that audio and annotation filenames are unique, and that the “stem” of annotation filenames is the same as the corresponding stem of the audio filename (Raven saves files using this convention by default).

E.g. The following format is correct: audio_directory/audio_file_1.wav raven_directory/audio_file_1.Table.1.selections.txt

Parameters: raven_directory (str or pathlib.Path) – The path which contains lowercase Raven annotations file(s) audio_directory (str or pathlib.Path) – The path which contains audio file(s) with names the same as annotation files destination (str or pathlib.Path) – The path at which to save the splits and the one-hot encoded labels file col (str) – The column containing species labels in the Raven files sample_rate (int) – Desired sample rate of split audio clips clip_duration (float) – Length of each clip clip_overlap (float) – Amount of overlap between subsequent clips [default: 0] final_clip (str or None) – Behavior if final_clip is less than clip_duration seconds long. [default: None] By default, ignores final clip entirely. Possible options (any other input will ignore the final clip entirely), ”full”: Increase the overlap with previous audio to yield a clip with clip_duration length ”remainder”: Include the remainder of the Audio (clip will NOT have clip_duration length) ”extend”: Similar to remainder but extend the clip with silence to reach clip_duration length ”loop”: Similar to remainder but loop (repeat) the clip to reach clip_duration length extensions (list) – List of audio filename extensions to look for. [default: [‘wav’, ‘WAV’, ‘mp3’]] csv_name (str) – Filename of the output csv, to be saved in the specified destination [default: ‘labels.csv’] min_label_len (float) – the minimum amount a label must overlap with the split to be considered a label. Useful for excluding short annotations or annotations that barely overlap the split. For example, if 1, the label will only be included if the annotation is at least 1s long and either starts at least 1s before the end of the split, or ends at least 1s after the start of the split. By default, any label is kept [default: 0] labeled_clips_only (bool) – Whether to only save clips that contain labels of the species of interest. [default: False] species (str, list, or None) – Species labels to get. If None, gets a list of labels from all selections files. [default: None] dry_run (bool) – If True, skip writing audio and just return clip DataFrame [default: False] verbose (bool) – If True, prints progress information [default:False]

Returns:

opensoundscape.raven.split_single_annotation(raven_file, col, split_len_s, overlap_len_s=0, total_len_s=None, keep_final=False, species=None, min_label_len=0)

Split a Raven selection table into short annotations

Aggregate one-hot annotations for even-lengthed time segments, drawing annotations from a specified column of a Raven selection table

Parameters: raven_file (str) – path to Raven selections file col (str) – name of column in Raven file to look for annotations in split_len_s (float) – length of segments to break annotations into (e.g. for 5s: 5) overlap_len_s (float) – length of overlap between segments (e.g. for 2.5s: 2.5) total_len_s (float) – length of original file (e.g. for 5-minute file: 300) If not provided, estimates length based on end time of last annotation [default: None] keep_final (string) – whether to keep annotations from the final clip if the final clip is less than split_len_s long. If using “remainder”, “full”, “extend”, or “loop” with split_and_save, make this True. Else, make it False. [default: False] species (str, list, or None) – species or list of species annotations to look for [default: None] min_label_len (float) – the minimum amount a label must overlap with the split to be considered a label. Useful for excluding short annotations or annotations that barely overlap the split. For example, if 1, the label will only be included if the annotation is at least 1s long and either starts at least 1s before the end of the split, or ends at least 1s after the start of the split. By default, any label is kept [default: 0] columns ‘seg_start’, ‘seg_end’, and all species, each row containing 1/0 annotations for each species in a segment splits_df (pd.DataFrame)
opensoundscape.raven.split_starts_ends(raven_file, col, starts, ends, species=None, min_label_len=0)

Split Raven annotations using a list of start and end times

This function takes an array of start times and an array of end times, creating a one-hot encoded labels file by finding all Raven labels that fall within each start and end time pair.

This function is called by split_single_annotation(), which generates lists of start and end times. It is also called by raven_audio_split_and_save(), which gets the lists from metadata about audio files split by opensoundscape.audio.split_and_save.

Parameters: raven_file (pathlib.Path or str) – path to selections.txt file col (str) – name of column containing annotations starts (list) – start times of clips ends (list) – end times of clips species (str or list) – species names for columns of one-hot encoded file [default: None] min_label_len (float) – the minimum amount a label must overlap with the split to be considered a label. Useful for excluding short annotations or annotations that barely overlap the split. For example, if 1, the label will only be included if the annotation is at least 1s long and either starts at least 1s before the end of the split, or ends at least 1s after the start of the split. By default, any label is kept [default: 0] columns: ‘seg_start’, ‘seg_end’, and all unique labels (‘species’) rows: one per segment, containing 1/0 annotations for each potential label splits_df (pd.DataFrame)

## Taxa¶

a set of utilites for converting between scientific and common names of bird species in different naming systems (xeno canto and bird net)

opensoundscape.taxa.bn_common_to_sci(common)

convert bird net common name (ignoring dashes, spaces, case) to scientific name as lowercase-hyphenated

opensoundscape.taxa.common_to_sci(common)

convert bird net common name (ignoring dashes, spaces, case) to scientific name as lowercase-hyphenated

opensoundscape.taxa.get_species_list()

list of scientific-names (lowercase-hyphenated) of species in the loaded species table

opensoundscape.taxa.sci_to_bn_common(scientific)

convert scientific name as lowercase-hyphenated to birdnet common name as lowercasenospaces

opensoundscape.taxa.sci_to_xc_common(scientific)

convert scientific name as lowercase-hyphenated to xeno-canto common name as lowercasenospaces

opensoundscape.taxa.xc_common_to_sci(common)

convert xeno-canto common name (ignoring dashes, spaces, case) to scientific name as lowercase-hyphenated