{ "cells": [ { "cell_type": "markdown", "id": "cba6163c-6803-4852-9f01-9508d451bac0", "metadata": {}, "source": [ "# Preprocess audio samples\n", "\n", "While the `CNN` class in OpenSoundscape contains a default Preprocessor, you may want to modify or create your own Preprocessor depending on the specific way you wish to generate samples. \n", "\n", "Note that the default preprocessor that is a good starting point for training, and if you're using a pre-trained CNN for prediction you don't need to (and probably shouldn't!) modify the preprocessing. So, if you just want to train or predict with CNNs, you might not need to delve into the depths of this tutorial. However, for those trying to create high-performing custom models, using custom preprocessing is a powerful way to improve their performance.\n", "\n", "This tutorial describes how you can use two important types of objects in OpenSoundscape to modify preprocessing.\n", "\n", "* `Preprocessors` in OpenSoundscape perform all of the preprocessing steps from loading a file from disk, up to providing a sample to the machine learning algorithm for training or prediction. They are designed to be flexible and customizable. These classes are used internally by classes such as `opensoundscape.ml.cnn.CNN` when (a) training a machine learning model in OpenSoundscape, or (b) making predictions with a machine learning model in OpenSoundscape. \n", "* `Datasets` are PyTorch's way of handling a list of inputs to preprocess. In OpenSoundscape, there are two built-in classes (`AudioFileDataset` and `AudioSplittingDataset`) which use a Preprocessor to generate samples from a list of file paths. \n", "\n", "\n", "## Run this tutorial\n", "\n", "This tutorial is more than a reference! It's a Jupyter Notebook which you can run and modify on Google Colab or your own computer.\n", "\n", "|Link to tutorial|How to run tutorial|\n", "| :- | :- |\n", "| [](https://colab.research.google.com/github/kitzeslab/opensoundscape/blob/master/docs/tutorials/preprocess_audio_dataset.ipynb) | The link opens the tutorial in Google Colab. Uncomment the \"installation\" line in the first cell to install OpenSoundscape. |\n", "| [](https://minhaskamal.github.io/DownGit/#/home?url=https://github.com/kitzeslab/opensoundscape/blob/master/docs/tutorials/preprocess_audio_dataset.ipynb) | The link downloads the tutorial file to your computer. Follow the [Jupyter installation instructions](https://opensoundscape.org/en/latest/installation/jupyter.html), then open the tutorial file in Jupyter. |\n", "\n", "\n", "## Intro to custom preprocessing\n", "\n", "Preprocessors are designed to be flexible and modular, so that each step of the preprocessing pipeline can be modified or removed. This notebook demonstrates:\n", "\n", "- preparation of audio data to be used by a preprocessor\n", "- how \"Actions\" are strung together in a Preprocessor to define how samples are generated\n", "- modifying the parameters of actions\n", "- turning Actions on and off\n", "- modifying the order and contents of a Preprocessor\n", "- use of the `SpectrogramPreprocessor` class, including examples of:\n", " * modifying audio and spectrogram parameters\n", " * changing the output image shape\n", " * changing the output type\n", " * turning augmentation on and off\n", " * modifying augmentation parameters\n", " * using the \"overlay\" augmentation\n", "- writing custom preprocessors and actions\n", "\n", "it also uses the Dataset classes to demonstrate\n", "- how to load one sample per file path\n", "- how to load long audio files as a series of shorter clips\n", "\n", "\n", "### How to access preprocessors\n", "When training a CNN model in OpenSoundscape, you will create an object of the CNN class. There are two ways to modify the preprocessing:\n", "\n", "1) Modify the model.preprocessor directly.\n", "\n", " The model contains a preprocessor object that you can modify, for instance:\n", " ```python\n", " model.preprocessor.pipeline.bandpass.bypass = True\n", " ```\n", "\n", "2) Overwrite the preprocessor with a new one:\n", "\n", " ```python\n", " my_preprocessor = SpectrogramPreprocessor(....) #this tutorial explains how to make a preprocessor\n", " #... modify it as desired...\n", " model.preprocessor = my_preprocessor\n", " ```\n", "\n", "### Notes on augmentations\n", "While training, the CNN class will use all actions in the preprocessor's pipeline. When runing validation or prediction, by default, the CNN will bypass any actions with `action.is_augmentation==True`.\n", "\n", "Note that if you want to create a preprocessor with overlay augmentation, it's easiest to use option 2 above and initialize the preprocessor with an `overlay_df`. " ] }, { "cell_type": "markdown", "id": "a4290151", "metadata": {}, "source": [ "## Information for Pytorch Users\n", "If you're looking to use OpenSoundscape's preprocessing tools, but use PyTorch (or Jax) directly for the rest of your training workflows, this section is for you. \n", "\n", "The `opensoundscape.ml.datasets.AudioFileDataset` and `opensoundscape.ml.datasets.AudioSplittingDataset` classes are subclases of torch's `Dataset` and can often be used as drop-in substitutions, just use a `DataLoader` collate function that returns the typical PyTorch DataLoader format: a tuple of (samples, labels) where each is a tensor with a leading batch dimension. A collate function with this behavior is provided in `opensoundscape.ml.utils`. Here's a quick exmaple:\n", "\n", "```python\n", "from opensoundscape import AudioFileDataset, SpectrogramPreprocessor\n", "from opensoundscape.ml.utils import collate_audio_samples_to_tensors\n", "\n", "preprocessor = SpectrogramPreprocessor(sample_duration=2,height=224,width=224)\n", "audio_dataset = AudioFileDataset(label_df,preprocessor)\n", "\n", "train_dataloader = DataLoader(\n", " audio_dataset,\n", " batch_size=64,\n", " shuffle=True,\n", " collate_fn = collate_audio_samples_to_tensors\n", ")\n", "```\n", "\n" ] }, { "cell_type": "markdown", "id": "38c5223d", "metadata": { "tags": [] }, "source": [ "## Set up tutorial" ] }, { "cell_type": "code", "execution_count": null, "id": "2ebe88c2-f086-4d26-a943-4391e33f7e83", "metadata": {}, "outputs": [], "source": [ "# if this is a Google Colab notebook, install opensoundscape in the runtime environment\n", "if 'google.colab' in str(get_ipython()):\n", " %pip install \"opensoundscape==0.12.1\" \"jupyter-client<8,>=5.3.4\" \"ipykernel==6.17.1\"" ] }, { "cell_type": "markdown", "id": "b8dd95fc-976f-4a1d-b4e4-1d164dfd3b3e", "metadata": {}, "source": [ "First, import some packages." ] }, { "cell_type": "code", "execution_count": null, "id": "4d255395-b2c8-4fdb-88f8-b2748a7919f0", "metadata": {}, "outputs": [], "source": [ "# Preprocessor classes are used to load, transform, and augment audio samples for use in a machine learing model\n", "from opensoundscape.preprocess.preprocessors import SpectrogramPreprocessor\n", "from opensoundscape.ml.datasets import AudioFileDataset, AudioSplittingDataset\n", "from opensoundscape import preprocess\n", "\n", "# helper function for displaying a sample as an image\n", "from opensoundscape.preprocess.utils import show_tensor, show_tensor_grid\n", "\n", "\n", "# other utilities and packages\n", "import torch\n", "import pandas as pd\n", "from pathlib import Path\n", "import numpy as np\n", "import random\n", "import subprocess\n", "import IPython.display as ipd" ] }, { "cell_type": "markdown", "id": "959aa7bb-6a4b-412a-a5a2-b1bda0f84f95", "metadata": {}, "source": [ "Set up plotting " ] }, { "cell_type": "code", "execution_count": 3, "id": "795e8954-8fc2-4bb6-95f3-c27bded476aa", "metadata": {}, "outputs": [], "source": [ "#set up plotting\n", "from matplotlib import pyplot as plt\n", "plt.rcParams['figure.figsize']=[15,5] #for large visuals\n", "%config InlineBackend.figure_format = 'retina'" ] }, { "cell_type": "markdown", "id": "ed4b2333-26f7-45f3-b291-7ad1f27ac77a", "metadata": {}, "source": [ "Set manual seeds for pytorch and python. These ensure the training results are reproducible. You probably don't want to do this when you actually train your model, but it's useful for debugging." ] }, { "cell_type": "code", "execution_count": 4, "id": "cd9ccb14-6471-4179-bf84-4325ffd353b9", "metadata": {}, "outputs": [], "source": [ "torch.manual_seed(0)\n", "np.random.seed(0)\n", "random.seed(0)" ] }, { "cell_type": "markdown", "id": "92c7eecb-2bc5-4335-8cc5-46e5503c635d", "metadata": {}, "source": [ "### Get example audio data\n", "\n", "\n", "The Kitzes Lab has created a small labeled dataset of short clips of American Woodcock vocalizations. You have two options for obtaining the folder of data, called `woodcock_labeled_data`:\n", "\n", "1. Run the following cell to download this small dataset. These commands require you to have `tar` installed on your computer, as they will download and unzip a compressed file in `.tar.gz` format. \n", "\n", "2. Download a `.zip` version of the files by clicking [here](https://pitt.box.com/shared/static/m0cmzebkr5qc49q9egxnrwwp50wi8zu5.zip). You will have to unzip this folder and place the unzipped folder in the same folder that this notebook is in.\n", "\n", "**Note**: Once you have the data, you do not need to run this cell again." ] }, { "cell_type": "code", "execution_count": 5, "id": "794a7de4-996e-4398-8b51-3cc72766f1f9", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", " 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\n", "100 9499k 100 9499k 0 0 1194k 0 0:00:07 0:00:07 --:--:-- 2637k\n" ] }, { "data": { "text/plain": [ "CompletedProcess(args=['rm', 'woodcock_labeled_data.tar.gz'], returncode=0)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "subprocess.run(\n", " [\n", " \"curl\",\n", " \"https://drive.google.com/uc?export=download&id=1Ly2M--dKzpx331cfUFdVuiP96QKGJz_P\",\n", " \"-L\",\n", " \"-o\",\n", " \"woodcock_labeled_data.tar.gz\",\n", " ]\n", ") # Download the data\n", "subprocess.run(\n", " [\"tar\", \"-xzf\", \"woodcock_labeled_data.tar.gz\"]\n", ") # Unzip the downloaded tar.gz file\n", "subprocess.run(\n", " [\"rm\", \"woodcock_labeled_data.tar.gz\"]\n", ") # Remove the file after its contents are unzipped" ] }, { "cell_type": "markdown", "id": "00dfb7ee-4f01-4654-bdd9-0d9e0f32bbf1", "metadata": {}, "source": [ "### Load dataframe of files and labels\n", "We need a dataframe with file paths in the index, so we manipulate the included one_hot_labels.csv slightly:" ] }, { "cell_type": "code", "execution_count": 6, "id": "21c429a3-bcc8-4083-b7ac-9d52bb19d2b8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | present | \n", "absent | \n", "
|---|---|---|
| file | \n", "\n", " | \n", " |
| ./woodcock_labeled_data/d4c40b6066b489518f8da83af1ee4984.wav | \n", "1 | \n", "0 | \n", "
| ./woodcock_labeled_data/e84a4b60a4f2d049d73162ee99a7ead8.wav | \n", "0 | \n", "1 | \n", "
| ./woodcock_labeled_data/79678c979ebb880d5ed6d56f26ba69ff.wav | \n", "1 | \n", "0 | \n", "
| ./woodcock_labeled_data/49890077267b569e142440fa39b3041c.wav | \n", "1 | \n", "0 | \n", "
| ./woodcock_labeled_data/0c453a87185d8c7ce05c5c5ac5d525dc.wav | \n", "1 | \n", "0 | \n", "