Annotations

Raven

raven.py: Utilities for dealing with Raven files

opensoundscape.raven.annotation_check(directory, col)

Check that rows of Raven annotations files contain class labels

Parameters:
  • directory – The path which contains Raven annotations file(s)
  • col – Name of column containing annotations
Returns:

None

opensoundscape.raven.generate_class_corrections(directory, col)

Generate a CSV to specify any class overrides

Parameters:
  • directory – The path which contains lowercase Raven annotations file(s)
  • col – Name of column containing annotations
Returns:

A multiline string containing a CSV file with two columns

raw and corrected

Return type:

csv (string)

opensoundscape.raven.generate_split_labels_file(directory, col, split_len_s, total_len_s=None, species=None, out_csv=None)

Generate binary labels for a directory of Raven annotations

Given a directory of lowercase Raven annotations, splits the annotations into segments that can be used as labels for machine learning programs that only take short segments.

Parameters:
  • directory – The path which contains lowercase Raven annotations file(s)
  • col (str) – name of column in Raven file to look for annotations in
  • split_len_s (int) – length of segments to break annotations into (e.g. for 5s: 5)
  • total_len_s (float) – length of original files (e.g. for 5-minute file: 300). If not provided, estimates length individually for each file based on end time of last annotation [default: None]
  • species (str, list, or None) – species or list of species annotations to look for [default: None]
  • out_csv (str) (optional) – None]
Returns:

split file of the format

filename, start_seg, end_seg, species1, species2, …, speciesN orig/fname1, 0, 5, 0, 1, …, 1 orig/fname1, 5, 10, 0, 0, …, 1 orig/fname2, 0, 5, 1, 1, …, 1 …

saves all_selections to out_csv if this is specified

Return type:

all_selections (pd.DataFrame)

opensoundscape.raven.get_labels_in_dataset(selections_files, col)

Get list of all labels in selections_files

Parameters:
  • selections_files (list) – list of Raven selections.txt files
  • col (str) – the name of the column containing the labels
Returns:

a list of the unique values found in the label column of this dataset

opensoundscape.raven.lowercase_annotations(directory, out_dir=None)

Convert Raven annotation files to lowercase and save

Parameters:
  • directory – The path which contains Raven annotations file(s)
  • out_dir – The path at which to save (default: save in directory, same location as annotations) [default: None]
Returns:

None

opensoundscape.raven.query_annotations(directory, cls, col, print_out=False)

Given a directory of Raven annotations, query for a specific class

Parameters:
  • directory – The path which contains lowercase Raven annotations file(s)
  • cls – The class which you would like to query for
  • col – Name of column containing annotations
  • print_out
    Format of output.
    If True, output contains delimiters. If False, returns output

    [default: False]

Returns:

A multiline string containing annotation file and rows matching the query cls

Return type:

output (string)

opensoundscape.raven.raven_audio_split_and_save(raven_directory, audio_directory, destination, col, sample_rate, clip_duration, clip_overlap=0, final_clip=None, extensions=['wav', 'WAV', 'mp3'], csv_name='labels.csv', labeled_clips_only=False, min_label_len=0, species=None, dry_run=False, verbose=False)

Split audio and annotations files simultaneously

Splits audio into short clips with the desired overlap. Saves these clips and a one-hot encoded labels CSV into the directory of choice. Labels for csv are selected based on all labels in clips.

Requires that audio and annotation filenames are unique, and that the “stem” of annotation filenames is the same as the corresponding stem of the audio filename (Raven saves files using this convention by default).

E.g. The following format is correct: audio_directory/audio_file_1.wav raven_directory/audio_file_1.Table.1.selections.txt

Parameters:
  • raven_directory (str or pathlib.Path) – The path which contains lowercase Raven annotations file(s)
  • audio_directory (str or pathlib.Path) – The path which contains audio file(s) with names the same as annotation files
  • destination (str or pathlib.Path) – The path at which to save the splits and the one-hot encoded labels file
  • col (str) – The column containing species labels in the Raven files
  • sample_rate (int) – Desired sample rate of split audio clips
  • clip_duration (float) – Length of each clip
  • clip_overlap (float) – Amount of overlap between subsequent clips [default: 0]
  • final_clip (str or None) –

    Behavior if final_clip is less than clip_duration seconds long. [default: None] By default, ignores final clip entirely. Possible options (any other input will ignore the final clip entirely),

    • ”full”: Increase the overlap with previous audio to yield a clip with clip_duration length
    • ”remainder”: Include the remainder of the Audio (clip will NOT have clip_duration length)
    • ”extend”: Similar to remainder but extend the clip with silence to reach clip_duration length
    • ”loop”: Similar to remainder but loop (repeat) the clip to reach clip_duration length
  • extensions (list) – List of audio filename extensions to look for. [default: [‘wav’, ‘WAV’, ‘mp3’]]
  • csv_name (str) – Filename of the output csv, to be saved in the specified destination [default: ‘labels.csv’]
  • min_label_len (float) – the minimum amount a label must overlap with the split to be considered a label. Useful for excluding short annotations or annotations that barely overlap the split. For example, if 1, the label will only be included if the annotation is at least 1s long and either starts at least 1s before the end of the split, or ends at least 1s after the start of the split. By default, any label is kept [default: 0]
  • labeled_clips_only (bool) – Whether to only save clips that contain labels of the species of interest. [default: False]
  • species (str, list, or None) – Species labels to get. If None, gets a list of labels from all selections files. [default: None]
  • dry_run (bool) – If True, skip writing audio and just return clip DataFrame [default: False]
  • verbose (bool) – If True, prints progress information [default:False]

Returns:

opensoundscape.raven.split_single_annotation(raven_file, col, split_len_s, overlap_len_s=0, total_len_s=None, keep_final=False, species=None, min_label_len=0)

Split a Raven selection table into short annotations

Aggregate one-hot annotations for even-lengthed time segments, drawing annotations from a specified column of a Raven selection table

Parameters:
  • raven_file (str) – path to Raven selections file
  • col (str) – name of column in Raven file to look for annotations in
  • split_len_s (float) – length of segments to break annotations into (e.g. for 5s: 5)
  • overlap_len_s (float) – length of overlap between segments (e.g. for 2.5s: 2.5)
  • total_len_s (float) – length of original file (e.g. for 5-minute file: 300) If not provided, estimates length based on end time of last annotation [default: None]
  • keep_final (string) – whether to keep annotations from the final clip if the final clip is less than split_len_s long. If using “remainder”, “full”, “extend”, or “loop” with split_and_save, make this True. Else, make it False. [default: False]
  • species (str, list, or None) – species or list of species annotations to look for [default: None]
  • min_label_len (float) – the minimum amount a label must overlap with the split to be considered a label. Useful for excluding short annotations or annotations that barely overlap the split. For example, if 1, the label will only be included if the annotation is at least 1s long and either starts at least 1s before the end of the split, or ends at least 1s after the start of the split. By default, any label is kept [default: 0]
Returns:

columns ‘seg_start’, ‘seg_end’, and all species,

each row containing 1/0 annotations for each species in a segment

Return type:

splits_df (pd.DataFrame)

opensoundscape.raven.split_starts_ends(raven_file, col, starts, ends, species=None, min_label_len=0)

Split Raven annotations using a list of start and end times

This function takes an array of start times and an array of end times, creating a one-hot encoded labels file by finding all Raven labels that fall within each start and end time pair.

This function is called by split_single_annotation(), which generates lists of start and end times. It is also called by raven_audio_split_and_save(), which gets the lists from metadata about audio files split by opensoundscape.audio.split_and_save.

Parameters:
  • raven_file (pathlib.Path or str) – path to selections.txt file
  • col (str) – name of column containing annotations
  • starts (list) – start times of clips
  • ends (list) – end times of clips
  • species (str or list) – species names for columns of one-hot encoded file [default: None]
  • min_label_len (float) – the minimum amount a label must overlap with the split to be considered a label. Useful for excluding short annotations or annotations that barely overlap the split. For example, if 1, the label will only be included if the annotation is at least 1s long and either starts at least 1s before the end of the split, or ends at least 1s after the start of the split. By default, any label is kept [default: 0]
Returns:

columns: ‘seg_start’, ‘seg_end’, and all unique labels (‘species’) rows: one per segment, containing 1/0 annotations for each potential label

Return type:

splits_df (pd.DataFrame)

Species Table

Taxa

a set of utilites for converting between scientific and common names of bird species in different naming systems (xeno canto and bird net)

opensoundscape.taxa.bn_common_to_sci(common)

convert bird net common name (ignoring dashes, spaces, case) to scientific name as lowercase-hyphenated

opensoundscape.taxa.common_to_sci(common)

convert bird net common name (ignoring dashes, spaces, case) to scientific name as lowercase-hyphenated

opensoundscape.taxa.get_species_list()

list of scientific-names (lowercase-hyphenated) of species in the loaded species table

opensoundscape.taxa.sci_to_bn_common(scientific)

convert scientific name as lowercase-hyphenated to birdnet common name as lowercasenospaces

opensoundscape.taxa.sci_to_xc_common(scientific)

convert scientific name as lowercase-hyphenated to xeno-canto common name as lowercasenospaces

opensoundscape.taxa.xc_common_to_sci(common)

convert xeno-canto common name (ignoring dashes, spaces, case) to scientific name as lowercase-hyphenated