{
"cells": [
{
"cell_type": "markdown",
"id": "7416bd7c",
"metadata": {},
"source": [
"# Agile Bioacoustic Modeling with SongSpace\n",
"\n",
"SongSpace provides a workflow for active or \"agile\" learning for bioacoustics data. Embed audio into a databse, query the database with vector search or classifieres, and select clips for active learning review or final verification for ecological analyses. \n",
"\n",
"Embeddings are saved in a HopLite database. The same folder storing the (sql) embedding database will also store classifiers and tables for labeled datasets. The full workspace can be saved and loaded with ss.save(path) and SongSpace.load(path). \n"
]
},
{
"cell_type": "markdown",
"id": "7e4c6df3",
"metadata": {},
"source": [
"## Run this tutorial\n",
"\n",
"If running in Colab, uncomment the installation line below."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ab51e7e7",
"metadata": {},
"outputs": [],
"source": [
"# if 'google.colab' in str(get_ipython()):\n",
"# %pip install \"opensoundscape==0.12.1\" \"bioacoustics-model-zoo==0.12.0\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "352a2a8b",
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"import bioacoustics_model_zoo as bmz\n",
"\n",
"from opensoundscape.annotations import BoxedAnnotations\n",
"from opensoundscape.vector_database import load_or_create_hoplite_usearch_db\n",
"from opensoundscape.ml.song_space import SongSpace\n",
"from opensoundscape.ml.shallow_classifier import select_from_hoplite\n",
"from opensoundscape.visualization import annotate, inspect"
]
},
{
"cell_type": "markdown",
"id": "1f281981",
"metadata": {},
"source": [
"#### Download example files\n",
"Download a set of aquatic soundscape recordings with annotations of _Rana sierrae_ vocalizations\n",
"\n",
"Option 1: run the cell below\n",
"\n",
"- if you get a 403 error, DataDryad suspects you are a bot. Use Option 2. \n",
"\n",
"Option 2:\n",
"\n",
"- Download and unzip the `rana_sierrae_2022.zip` folder containing audio and annotations from this [public Dryad dataset](https://datadryad.org/stash/dataset/doi:10.5061/dryad.9s4mw6mn3#readme)\n",
"- Move the unzipped `rana_sierrae_2022` folder into the current folder"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1d43f6c3",
"metadata": {},
"outputs": [],
"source": [
"# # Note: the \"!\" preceding each line below allows us to run bash commands in a Jupyter notebook\n",
"# # If you are not running this code in a notebook, input these commands into your terminal instead\n",
"# !wget -O rana_sierrae_2022.zip https://datadryad.org/stash/downloads/file_stream/2722802;\n",
"# !unzip rana_sierrae_2022;"
]
},
{
"cell_type": "markdown",
"id": "f1b666b8",
"metadata": {},
"source": [
"#### Prepare audio data\n",
"See the train_cnn.ipynb tutorial for step-by-step walkthrough of this process, or just run the cells below to prepare a training set."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f3765908",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/SML161/opensoundscape/opensoundscape/annotations.py:347: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.\n",
" all_annotations_df = pd.concat(all_file_dfs).reset_index(drop=True)\n"
]
}
],
"source": [
"# Set this variable to specify where the folder `rana_sierrae_2022` is located:\n",
"dataset_path = Path(\"./rana_sierrae_2022/\")\n",
"\n",
"# let's generate clip labels of 5s duration (to match HawkEars) using the raven annotations\n",
"# and some utility functions from opensoundscape\n",
"from opensoundscape.annotations import BoxedAnnotations\n",
"\n",
"audio_and_raven_files = pd.read_csv(f\"{dataset_path}/audio_and_raven_files.csv\")\n",
"# update the paths to where we have the audio and raven files stored\n",
"audio_and_raven_files[\"audio\"] = audio_and_raven_files[\"audio\"].apply(\n",
" lambda x: f\"{dataset_path}/{x}\"\n",
")\n",
"audio_and_raven_files[\"raven\"] = audio_and_raven_files[\"raven\"].apply(\n",
" lambda x: f\"{dataset_path}/{x}\"\n",
")\n",
"\n",
"annotations = BoxedAnnotations.from_raven_files(\n",
" raven_files=audio_and_raven_files[\"raven\"],\n",
" audio_files=audio_and_raven_files[\"audio\"],\n",
" annotation_column=\"annotation\",\n",
")\n",
"# generate labels for 5s clips, including any labels that overlap by at least 0.2 seconds\n",
"labels = annotations.clip_labels(\n",
" clip_duration=3, min_label_overlap=0.2, final_clip=None\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0b83317b",
"metadata": {},
"source": [
"## Prepare labels"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "2a831276",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/SML161/opensoundscape/opensoundscape/annotations.py:347: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.\n",
" all_annotations_df = pd.concat(all_file_dfs).reset_index(drop=True)\n"
]
}
],
"source": [
"dataset_path = Path(\"./rana_sierrae_2022/\")\n",
"audio_and_raven_files = pd.read_csv(dataset_path / \"audio_and_raven_files.csv\")\n",
"audio_and_raven_files[\"audio\"] = audio_and_raven_files[\"audio\"].apply(\n",
" lambda x: str(dataset_path / x)\n",
")\n",
"audio_and_raven_files[\"raven\"] = audio_and_raven_files[\"raven\"].apply(\n",
" lambda x: str(dataset_path / x)\n",
")\n",
"\n",
"annotations = BoxedAnnotations.from_raven_files(\n",
" raven_files=audio_and_raven_files[\"raven\"],\n",
" audio_files=audio_and_raven_files[\"audio\"],\n",
" annotation_column=\"annotation\",\n",
")\n",
"\n",
"labels = annotations.clip_labels(clip_duration=3, min_label_overlap=0.2)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0ff5ac69",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"seed_train: (4, 1)\n",
"validation: (536, 1)\n",
"pool: (2148, 1)\n"
]
}
],
"source": [
"target_source_class = \"C\"\n",
"target_model_class = \"RanaSierrae_C\"\n",
"\n",
"# start with one recording of target class\n",
"binary_labels = labels[[target_source_class]].rename(\n",
" columns={target_source_class: target_model_class}\n",
")\n",
"seed_train = binary_labels.loc[\n",
" [\"rana_sierrae_2022/mp3/sine2022a_MSD-0558_20220623_060000_0-10s.mp3\"]\n",
"]\n",
"other = binary_labels.drop(seed_train.index)\n",
"validation, unlabeled = train_test_split(other, test_size=0.8, random_state=0)\n",
"\n",
"print(\"seed_train:\", seed_train.shape)\n",
"print(\"validation:\", validation.shape)\n",
"print(\"pool:\", unlabeled.shape)"
]
},
{
"cell_type": "markdown",
"id": "1876817c",
"metadata": {},
"source": [
"All audio clips from the single audio file we'll start with for positives:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "146752f6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
"\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
"
\n",
" \n",
"
\n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"_ = inspect(seed_train, bandpass_range=(0, 2500))"
]
},
{
"cell_type": "markdown",
"id": "390830fa",
"metadata": {},
"source": [
"## Build database and SongSpace\n",
"\n",
"The default Machine Learning embedding model is Perch V2, a TensorFlow model provided via the Bioacoustics Model Zoo. If you wish to avoid installing TensorFlow, consider specifying `feature_extractor='perch2_onnx'` for an ONNX formatted version (currently CPU only), or selecting another model such as 'birdnet' (TFLite) or 'bs-convnext' (PyTorch). Alternatively, advanced users can provide a custom embedding model with a .embed() method matching the opensoundscape.CNN.embed() API.\n",
"\n",
"It is critical to maintain consistency of the machine learning model within a single SongSpace: you cannot change embedding models (or model versions) for different datasets or tasks within a single SongSpace. You'll need to make a new SongSpace and re-ingest all audio files if you change backbone embedding models. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "8fec817a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connected to existing database with 2,691 embeddings from 672 files.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/SML161/miniconda3/envs/opso_dev/lib/python3.13/site-packages/bioacoustics_model_zoo/perch_v2.py:215: UserWarning: Disabling TensorFlow's XLA compilation (setting tf.config.optimizer.set_jit(False)) because otherwise TF models on Mac hang at runtime as of Tensorflow 2.21.0\n",
" warnings.warn(\n"
]
}
],
"source": [
"ss = SongSpace(\"./Perch2SongSpace\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "cb59c932",
"metadata": {},
"outputs": [],
"source": [
"import opensoundscape as opso\n",
"\n",
"opso.set_seed(0)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c97de950",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"all samples already have embeddings in the database\n",
"all samples already have embeddings in the database\n",
"all samples already have embeddings in the database\n"
]
},
{
"data": {
"text/plain": [
"['round1_train', 'validation', 'pool_unlabeled']"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Embed and register datasets in SongSpace.\n",
"ss.ingest_audio(\n",
" seed_train,\n",
" dataset_name=\"round1_train\",\n",
" batch_size=32,\n",
")\n",
"ss.ingest_audio(\n",
" validation,\n",
" dataset_name=\"validation\",\n",
" allow_training=False,\n",
" batch_size=32,\n",
")\n",
"ss.ingest_audio(\n",
" unlabeled,\n",
" dataset_name=\"pool_unlabeled\",\n",
" batch_size=32,\n",
")\n",
"\n",
"ss.list_datasets()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "dcbbc04d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Saved SongSpace to ./Perch2SongSpace with 0 classifiers and 3 datasets.\n"
]
}
],
"source": [
"ss.save()"
]
},
{
"cell_type": "markdown",
"id": "144d3e87",
"metadata": {},
"source": [
"## Similarity search for similar samples"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "f8f533a3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"embedding query samples\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/SML161/opensoundscape/opensoundscape/ml/cnn.py:2955: UserWarning: The columns of input samples df differ from `model.classes`. Discarding sample df columns.\n",
" warnings.warn(\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3e5fb70900324b65a99afd55e520a29a",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/4 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
"I0000 00:00:1782311451.465561 4562402 service.cc:153] XLA service 0x393fcb4c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n",
"I0000 00:00:1782311451.465592 4562402 service.cc:161] StreamExecutor [0]: Host, Default Version (Driver: 0.0.0; Runtime: 0.0.0; Toolkit: 0.0.0; DNN: 0.0.0)\n",
"I0000 00:00:1782311451.726795 4562402 dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.\n",
"W0000 00:00:1782311452.145856 4591712 cpp_gen_intrinsics.cc:74] Empty bitcode string provided for eigen. Optimizations relying on this IR will be disabled.\n",
"I0000 00:00:1782311452.146537 4591712 rsqrt.cc:179] Falling back to 1 / sqrt(x) for f32 false\n",
"I0000 00:00:1782311452.566058 4562402 device_compiler.h:208] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"performing similarity search for each of 4 query samples\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "cdc031b8effe4cfdb5fc99e25d8e1947",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"GridBox(children=(VBox(children=(HTML(value='\n",
"\n",
"