{ "cells": [ { "cell_type": "markdown", "id": "e5d6e344-3ed7-4b0a-aa36-2b83d4842bff", "metadata": {}, "source": [ "# Train a CNN\n", "\n", "Convolutional neural networks (CNNs) are popular tools for creating automated machine learning classifiers on images or image-like samples. By converting audio into a two-dimensional frequency vs. time representation such as a spectrogram, we can generate image-like samples that can be used to train CNNs. \n", "\n", "This tutorial demonstrates the basic use of OpenSoundscape's `preprocessors` and `cnn` modules for training CNNs and making predictions using CNNs.\n", "\n", "Under the hood, OpenSoundscape uses Pytorch for machine learning tasks. By using the class `opensoundscape.ml.cnn.CNN`, you can train and predict with PyTorch's powerful CNN architectures in just a few lines of code. \n", "\n", "## Run this tutorial\n", "\n", "This tutorial is more than a reference! It's a Jupyter Notebook which you can run and modify on Google Colab or your own computer.\n", "\n", "|Link to tutorial|How to run tutorial|\n", "| :- | :- |\n", "| [](https://colab.research.google.com/github/kitzeslab/opensoundscape/blob/master/docs/tutorials/train_cnn.ipynb) | The link opens the tutorial in Google Colab. Uncomment the \"installation\" line in the first cell to install OpenSoundscape. |\n", "| [](https://minhaskamal.github.io/DownGit/#/home?url=https://github.com/kitzeslab/opensoundscape/blob/master/docs/tutorials/train_cnn.ipynb) | The link downloads the tutorial file to your computer. Follow the [Jupyter installation instructions](https://opensoundscape.org/en/latest/installation/jupyter.html), then open the tutorial file in Jupyter. |" ] }, { "cell_type": "code", "execution_count": null, "id": "b52ecca1-702b-4fa3-a48b-61025f55d8fd", "metadata": {}, "outputs": [], "source": [ "# if this is a Google Colab notebook, install opensoundscape in the runtime environment\n", "if 'google.colab' in str(get_ipython()):\n", " %pip install \"opensoundscape==0.12.1\" \"jupyter-client<8,>=5.3.4\" \"ipykernel==6.17.1\"\n", " num_workers=0\n", "else:\n", " num_workers=4" ] }, { "cell_type": "markdown", "id": "c4d88b73-77d1-4c00-a83a-8466fd79e15e", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "id": "59c9eee8-c65c-4df1-95d0-15dda341ee0a", "metadata": {}, "source": [ "### Import needed packages" ] }, { "cell_type": "code", "execution_count": 2, "id": "972e3e01-c85f-415d-95cc-9b695332f738", "metadata": {}, "outputs": [], "source": [ "# the cnn module provides classes for training/predicting with various types of CNNs\n", "from opensoundscape import CNN\n", "\n", "#other utilities and packages\n", "import torch\n", "import pandas as pd\n", "from pathlib import Path\n", "import numpy as np\n", "import pandas as pd\n", "import random \n", "import subprocess\n", "from glob import glob\n", "import sklearn\n", "\n", "#set up plotting\n", "from matplotlib import pyplot as plt\n", "plt.rcParams['figure.figsize']=[15,5] #for large visuals\n", "%config InlineBackend.figure_format = 'retina'" ] }, { "cell_type": "markdown", "id": "22adf5d6-403d-4a06-bc85-477cdc60ec07", "metadata": {}, "source": [ "### Set random seeds\n", "\n", "Set manual seeds for Pytorch and Python. These essentially \"fix\" the results of any stochastic steps in model training, ensuring that training results are reproducible. You probably don't want to do this when you actually train your model, but it's useful for debugging." ] }, { "cell_type": "code", "execution_count": 3, "id": "68e09bd5-e86d-44e0-8ffa-0f8ee699c31f", "metadata": {}, "outputs": [], "source": [ "torch.manual_seed(0)\n", "random.seed(0)\n", "np.random.seed(0)" ] }, { "cell_type": "markdown", "id": "e1c60bac-280a-4d72-80b6-2659f6ecd83d", "metadata": {}, "source": [ "### Download files\n", "\n", "Training a machine learning model requires some pre-labeled data. These data, in the form of audio recordings or spectrograms, are labeled with whether or not they contain the sound of the species of interest. \n", "\n", "These data can be obtained from online databases such as Xeno-Canto.org, or by labeling one's own ARU data using a program like Cornell's Raven sound analysis software. In this example we are using a set of annotated avian soundscape recordings that were annotated using the software Raven Pro 1.6.4 (Bioacoustics Research Program 2022):\n", "\n", "
An annotated set of audio recordings of Eastern North American birds containing frequency, time, and species information. Lauren M. Chronister, Tessa A. Rhinehart, Aidan Place, Justin Kitzes.\n", "https://doi.org/10.1002/ecy.3329 \n", "\n", "\n", "These are the same data that are used by the annotation and preprocessing tutorials, so you can skip this step if you've already downloaded them there." ] }, { "cell_type": "markdown", "id": "947448da", "metadata": {}, "source": [ "### Download example files\n", "Download a set of example audio files and Raven annotations:\n", "\n", "Option 1: run the cell below\n", "\n", "- if you get a 403 error, DataDryad suspects you are a bot. Use Option 2. \n", "\n", "Option 2:\n", "\n", "- Download and unzip both `annotation_Files.zip` and `mp3_Files.zip` from the https://datadryad.org/stash/dataset/doi:10.5061/dryad.d2547d81z \n", "- Move the unzipped contents into a subfolder of the current folder called `./annotated_data/`" ] }, { "cell_type": "code", "execution_count": 4, "id": "7d8bf5cf-6c0b-43d6-a3bc-62657597fbec", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2024-10-08 13:17:47-- https://datadryad.org/stash/downloads/file_stream/641805\n", "Resolving datadryad.org (datadryad.org)... 52.25.192.224, 34.211.245.249, 35.82.66.187, ...\n", "Connecting to datadryad.org (datadryad.org)|52.25.192.224|:443... connected.\n", "HTTP request sent, awaiting response... 403 Forbidden\n", "2024-10-08 13:17:49 ERROR 403: Forbidden.\n", "\n", "--2024-10-08 13:17:50-- https://datadryad.org/stash/downloads/file_stream/641807\n", "Resolving datadryad.org (datadryad.org)... 34.211.245.249, 35.82.66.187, 52.36.117.254, ...\n", "Connecting to datadryad.org (datadryad.org)|34.211.245.249|:443... connected.\n", "HTTP request sent, awaiting response... 403 Forbidden\n", "2024-10-08 13:17:50 ERROR 403: Forbidden.\n", "\n", "mkdir: annotated_data: File exists\n", "Archive: annotation_Files.zip\n", " End-of-central-directory signature not found. Either this file is not\n", " a zipfile, or it constitutes one disk of a multi-part archive. In the\n", " latter case the central directory and zipfile comment will be found on\n", " the last disk(s) of this archive.\n", "unzip: cannot find zipfile directory in one of annotation_Files.zip or\n", " annotation_Files.zip.zip, and cannot find annotation_Files.zip.ZIP, period.\n", "Archive: mp3_Files.zip\n", " End-of-central-directory signature not found. Either this file is not\n", " a zipfile, or it constitutes one disk of a multi-part archive. In the\n", " latter case the central directory and zipfile comment will be found on\n", " the last disk(s) of this archive.\n", "unzip: cannot find zipfile directory in one of mp3_Files.zip or\n", " mp3_Files.zip.zip, and cannot find mp3_Files.zip.ZIP, period.\n" ] } ], "source": [ "# Note: the \"!\" preceding each line below allows us to run bash commands in a Jupyter notebook\n", "# If you are not running this code in a notebook, input these commands into your terminal instead\n", "!wget -O annotation_Files.zip https://datadryad.org/stash/downloads/file_stream/641805;\n", "!wget -O mp3_Files.zip https://datadryad.org/stash/downloads/file_stream/641807;\n", "!mkdir annotated_data;\n", "!unzip annotation_Files.zip -d ./annotated_data/annotation_Files;\n", "!unzip mp3_Files.zip -d ./annotated_data/mp3_Files;" ] }, { "cell_type": "markdown", "id": "82705d0a-f5f7-4104-8ea7-461ca7f72e4e", "metadata": {}, "source": [ "## Prepare audio data\n", "\n", "To prepare audio data for machine learning, we need to convert our annotated data into clip-level labels.\n", "\n", "These steps are covered in depth in other tutorials, so we'll just set our clip labels up quickly for this example.\n", "\n", "First, get exactly matched lists of audio files and their corresponding selection files:" ] }, { "cell_type": "code", "execution_count": null, "id": "61cbd28e-1e20-4709-95e7-dadf7f8b3f2c", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Set the current directory to where the dataset is downloaded\n", "dataset_path = Path(\"./annotated_data/\")\n", "\n", "# Make a list of all of the selection table files\n", "selection_files = glob(f\"{dataset_path}/annotation_Files/*/*.txt\")\n", "\n", "# Create a list of audio files, one corresponding to each Raven file\n", "# (Audio files have the same names as selection files with a different extension)\n", "audio_files = [\n", " f.replace(\"annotation_Files\", \"mp3_Files\").replace(\n", " \".Table.1.selections.txt\", \".mp3\"\n", " )\n", " for f in selection_files\n", "]" ] }, { "cell_type": "markdown", "id": "adc6709e-9508-4f08-b1ea-30d8662161b1", "metadata": {}, "source": [ "Next, convert the selection files and audio files to a `BoxedAnnotations` object, which contains the time, frequency, and label information for all annotations for every recording in the dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "77f3f7a5-e074-4313-a1bd-6b5a4c98612e", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/SML161/opensoundscape/opensoundscape/annotations.py:300: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.\n", " all_annotations_df = pd.concat(all_file_dfs).reset_index(drop=True)\n" ] } ], "source": [ "from opensoundscape.annotations import BoxedAnnotations\n", "\n", "# Create a dataframe of annotations\n", "annotations = BoxedAnnotations.from_raven_files(\n", " raven_files=selection_files, audio_files=audio_files, annotation_column=\"Species\"\n", ")" ] }, { "cell_type": "code", "execution_count": 10, "id": "0b8c74cb-3fbf-4f29-8ed5-d62f51b645a4", "metadata": { "tags": [] }, "outputs": [], "source": [ "%%capture\n", "# Parameters to use for label creation\n", "clip_duration = 3\n", "clip_overlap = 0\n", "min_label_overlap = 0.25\n", "species_of_interest = [\"NOCA\", \"EATO\", \"SCTA\", \"BAWW\", \"BCCH\", \"AMCR\", \"NOFL\"]\n", "\n", "# Create dataframe of one-hot labels\n", "clip_labels = annotations.clip_labels(\n", " clip_duration = clip_duration, \n", " clip_overlap = clip_overlap,\n", " min_label_overlap = min_label_overlap,\n", " class_subset = species_of_interest # You can comment this line out if you want to include all species.\n", ")" ] }, { "cell_type": "code", "execution_count": 11, "id": "71d2b3ae-a37b-4e2a-a0c0-4bd41fce40ae", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
| \n", " | \n", " | \n", " | NOCA | \n", "EATO | \n", "SCTA | \n", "BAWW | \n", "BCCH | \n", "AMCR | \n", "NOFL | \n", "
|---|---|---|---|---|---|---|---|---|---|
| file | \n", "start_time | \n", "end_time | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| annotated_data/mp3_Files/Recording_1/Recording_1_Segment_31.mp3 | \n", "0.0 | \n", "3.0 | \n", "False | \n", "True | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "
| 3.0 | \n", "6.0 | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "|
| 6.0 | \n", "9.0 | \n", "False | \n", "True | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "|
| 9.0 | \n", "12.0 | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "|
| 12.0 | \n", "15.0 | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "True | \n", "False | \n", "
/Users/SML161/opensoundscape/docs/tutorials/wandb/run-20241008_131926-701x1t52"
],
"text/plain": [
"