{
    "cells": [
        {
            "cell_type": "markdown",
            "id": "357397da-2398-4735-a4b3-9333a48e3161",
            "metadata": {},
            "source": [
                "# Customize CNN training\n",
                "\n",
                "This notebook demonstrates how to use classes from `opensoundscape.ml.cnn` and architectures created using `opensoundscape.ml.cnn_architectures` to\n",
                "\n",
                "- choose between single-target and multi-target model behavior\n",
                "- modify learning rates, learning rate decay schedule, and regularization\n",
                "- choose from various CNN architectures\n",
                "- train a multi-target model with a special loss function\n",
                "- use strategic sampling for imbalanced training data\n",
                "- customize preprocessing: train on spectrograms with a bandpassed frequency range\n",
                "\n",
                "Rather than demonstrating their effects on training (model training is slow!), most examples in this notebook either don't train the model or \"train\" it for 0 epochs for the purpose of demonstration.\n",
                "\n",
                "For an introductory demonstration of model training, please see the [\"Train a CNN\" tutorial](https://opensoundscape.org/en/latest/tutorials/train_cnn.html). For a demo of how to apply a trained model to a dataset, see the [\"Predict with pretrained CNNs\" tutorial](https://opensoundscape.org/en/latest/tutorials/predict_with_cnn.html).\n",
                "\n",
                "For a hands-on walkthrough of machine learning for bioacoustics, use the [\"Classifiers 101 Guide\"](https://opensoundscape.org/en/latest/classifier_guide/guide.html)\n",
                "\n",
                "## Run this tutorial\n",
                "\n",
                "This tutorial is more than a reference! It's a Jupyter Notebook which you can run and modify on Google Colab or your own computer.\n",
                "\n",
                "|Link to tutorial|How to run tutorial|\n",
                "| :- | :- |\n",
                "| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kitzeslab/opensoundscape/blob/master/docs/tutorials/customize_cnn_training.ipynb) | The link opens the tutorial in Google Colab. Uncomment the \"installation\" line in the first cell to install OpenSoundscape. |\n",
                "| [![Download via DownGit](https://img.shields.io/badge/GitHub-Download-teal?logo=github)](https://minhaskamal.github.io/DownGit/#/home?url=https://github.com/kitzeslab/opensoundscape/blob/master/docs/tutorials/customize_cnn_training.ipynb) | The link downloads the tutorial file to your computer. Follow the [Jupyter installation instructions](https://opensoundscape.org/en/latest/installation/jupyter.html), then open the tutorial file in Jupyter. |"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "b7bb1cf8-823c-49ae-8266-77cd8150f28e",
            "metadata": {},
            "outputs": [],
            "source": [
                "# if this is a Google Colab notebook, install opensoundscape in the runtime environment\n",
                "if 'google.colab' in str(get_ipython()):\n",
                "  %pip install \"opensoundscape==0.12.1\" \"jupyter-client<8,>=5.3.4\" \"ipykernel==6.17.1\""
            ]
        },
        {
            "cell_type": "markdown",
            "id": "25011ade-9721-424d-a554-d3212dfc7571",
            "metadata": {},
            "source": [
                "## Setup\n",
                "\n",
                "### Import needed packages"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "id": "eb11d417-c381-4950-a78c-65e8a512a786",
            "metadata": {},
            "outputs": [],
            "source": [
                "from opensoundscape.preprocess import preprocessors\n",
                "from opensoundscape.ml import cnn, cnn_architectures\n",
                "\n",
                "import torch\n",
                "import pandas as pd\n",
                "from pathlib import Path\n",
                "import numpy as np\n",
                "import random \n",
                "import subprocess\n",
                "\n",
                "from matplotlib import pyplot as plt\n",
                "plt.rcParams['figure.figsize']=[15,5] #for big visuals\n",
                "%config InlineBackend.figure_format = 'retina'"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "5eb3e483-1160-4abb-a774-7ca5f612dd52",
            "metadata": {},
            "source": [
                "### Download labeled audio files\n",
                "\n",
                "The Kitzes Lab has created a small labeled dataset of short clips of American Woodcock vocalizations. You have two options for obtaining the folder of data, called `woodcock_labeled_data`:\n",
                "\n",
                "1. Run the following cell to download this small dataset. These commands require you to have `tar` installed on your computer, as they will download and unzip a compressed file in `.tar.gz` format. \n",
                "\n",
                "2. Download a `.zip` version of the files by clicking [here](https://pitt.box.com/shared/static/m0cmzebkr5qc49q9egxnrwwp50wi8zu5.zip). You will have to unzip this folder and place the unzipped folder in the same folder that this notebook is in."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "811c6470-044b-4acd-a3dc-1492c7a1e39f",
            "metadata": {},
            "outputs": [
                {
                    "name": "stderr",
                    "output_type": "stream",
                    "text": [
                        "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
                        "                                 Dload  Upload   Total   Spent    Left  Speed\n",
                        "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\n",
                        "100 9499k  100 9499k    0     0  1316k      0  0:00:07  0:00:07 --:--:-- 2435k\n"
                    ]
                },
                {
                    "data": {
                        "text/plain": [
                            "CompletedProcess(args=['rm', 'woodcock_labeled_data.tar.gz'], returncode=0)"
                        ]
                    },
                    "execution_count": 2,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "subprocess.run(\n",
                "    [\n",
                "        \"curl\",\n",
                "        \"https://drive.google.com/uc?export=download&id=1Ly2M--dKzpx331cfUFdVuiP96QKGJz_P\",\n",
                "        \"-L\",\n",
                "        \"-o\",\n",
                "        \"woodcock_labeled_data.tar.gz\",\n",
                "    ]\n",
                ")  # Download the data\n",
                "subprocess.run(\n",
                "    [\"tar\", \"-xzf\", \"woodcock_labeled_data.tar.gz\"]\n",
                ")  # Unzip the downloaded tar.gz file\n",
                "subprocess.run(\n",
                "    [\"rm\", \"woodcock_labeled_data.tar.gz\"]\n",
                ")  # Remove the file after its contents are unzipped"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "fe28c0b3-f481-4d90-b3be-647c3d68fed5",
            "metadata": {},
            "source": [
                "### Prepare audio data\n",
                "\n",
                "To create a machine learning model, we need two dataframes of labeled clips, one for training and one for testing. \n",
                "\n",
                "The steps to create these dataframes are described in more detail in other tutorials (e.g. the [\"Audio annotations\" tutorial](tutorials/annotations.html)).\n",
                "\n",
                "First, we need a dataframe with file paths in the index, so we manipulate the included `one_hot_labels.csv` slightly."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "83f851bc-acad-47f5-a595-3f6ce6765dd7",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>present</th>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>file</th>\n",
                            "      <th></th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>./woodcock_labeled_data/d4c40b6066b489518f8da83af1ee4984.wav</th>\n",
                            "      <td>1</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>./woodcock_labeled_data/e84a4b60a4f2d049d73162ee99a7ead8.wav</th>\n",
                            "      <td>0</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>./woodcock_labeled_data/79678c979ebb880d5ed6d56f26ba69ff.wav</th>\n",
                            "      <td>1</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>./woodcock_labeled_data/49890077267b569e142440fa39b3041c.wav</th>\n",
                            "      <td>1</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>./woodcock_labeled_data/0c453a87185d8c7ce05c5c5ac5d525dc.wav</th>\n",
                            "      <td>1</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "                                                    present\n",
                            "file                                                       \n",
                            "./woodcock_labeled_data/d4c40b6066b489518f8da83...        1\n",
                            "./woodcock_labeled_data/e84a4b60a4f2d049d73162e...        0\n",
                            "./woodcock_labeled_data/79678c979ebb880d5ed6d56...        1\n",
                            "./woodcock_labeled_data/49890077267b569e142440f...        1\n",
                            "./woodcock_labeled_data/0c453a87185d8c7ce05c5c5...        1"
                        ]
                    },
                    "execution_count": 3,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "# Load one-hot labels dataframe\n",
                "labels = pd.read_csv(\"./woodcock_labeled_data/one_hot_labels.csv\").set_index(\"file\")[\n",
                "    [\"present\"]\n",
                "]\n",
                "\n",
                "# Prepend the folder location to the file paths\n",
                "labels.index = pd.Series(labels.index).apply(lambda f: \"./woodcock_labeled_data/\" + f)\n",
                "\n",
                "# Create class list\n",
                "classes = labels.columns\n",
                "\n",
                "# Inspect\n",
                "labels.head()"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "b329af17-b1ff-4fb8-9e78-4ebb35f7ae74",
            "metadata": {},
            "source": [
                "Next, randomly split these data into train and validation sets."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "958f6209-516c-42e0-8c0a-b6989c90bd30",
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "created train_df (len 23) and valid_df (len 6)\n"
                    ]
                }
            ],
            "source": [
                "from sklearn.model_selection import train_test_split\n",
                "\n",
                "train_df, valid_df = train_test_split(labels, test_size=0.2, random_state=0)\n",
                "print(f\"created train_df (len {len(train_df)}) and valid_df (len {len(valid_df)})\")"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "0dbc6207-4ab4-41f4-9016-de6f0577b6d3",
            "metadata": {},
            "source": [
                "## Model architectures\n",
                "\n",
                "We initialize a model object by specifying the architecture, a list of classes, and the duration of individual samples in seconds.\n",
                "\n",
                "The `architecture` is the particular design of the CNN. This option can either be a string matching one of the architectures available by default in OpenSoundscape, or a custom PyTorch model object.\n",
                "\n",
                "### Default architectures\n",
                "The `opensoundscape.ml.cnn_architectures` module provides functions to create several common CNN architectures. These architectures are built into PyTorch, but the OpenSoundscape module helps us out by reshaping the final layer to match the number of classes we have. \n",
                "\n",
                "Note that these will use default architecture parameters, including using pre-trained ImageNet weights. If you don't want to use pre-trained weights, follow the method below of creating the architecture and passing it to the initialization of CNN.\n",
                "\n",
                "See what architectures are available by default in OpenSoundscape:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "444ab229-eac7-4d80-b972-5d2ac3849c7a",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "['resnet18',\n",
                            " 'resnet34',\n",
                            " 'resnet50',\n",
                            " 'resnet101',\n",
                            " 'resnet152',\n",
                            " 'alexnet',\n",
                            " 'vgg11_bn',\n",
                            " 'squeezenet1_0',\n",
                            " 'densenet121',\n",
                            " 'inception_v3',\n",
                            " 'efficientnet_b0',\n",
                            " 'efficientnet_b4']"
                        ]
                    },
                    "execution_count": 5,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "import opensoundscape.ml\n",
                "\n",
                "opensoundscape.ml.cnn_architectures.list_architectures()"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "268a7e09-a8e6-4e74-800a-58fa98a5b8d3",
            "metadata": {},
            "source": [
                "For convenience, we can initialize a model object by providing the name of an architecture as a string, rather than the architecture object. \n",
                "\n",
                "Create a model with a `resnet34` architecture:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "acffd989-d2c1-4f8a-a8cf-d71c81f12b6e",
            "metadata": {},
            "outputs": [],
            "source": [
                "model = cnn.CNN(architecture=\"resnet34\", classes=classes, sample_duration=2.0)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "bde3d689-1a1a-4b17-b86f-e8ac14f1da30",
            "metadata": {},
            "source": [
                "For more control over model architectures, you will initialize the architecture using the corresponding OpenSoundscape object instead:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "id": "acb4211b-cd72-401b-b7d0-55722296e0f2",
            "metadata": {},
            "outputs": [
                {
                    "name": "stderr",
                    "output_type": "stream",
                    "text": [
                        "/Users/SML161/opensoundscape/opensoundscape/ml/cnn.py:616: UserWarning: Modifying .preprocessor to match architecture's expected number of channels (3) (originally 1).\n",
                        "  warnings.warn(\n"
                    ]
                }
            ],
            "source": [
                "arch = cnn_architectures.resnet50(num_classes=len(classes))\n",
                "\n",
                "model = cnn.CNN(arch, classes, sample_duration=2.0)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "4048bbc7-2d0d-4f85-a4e8-9618ad08dbe8",
            "metadata": {
                "tags": []
            },
            "source": [
                "### Use random weights\n",
                "\n",
                "By default, OpenSoundscape's models download weights pre-trained on ImageNet. \n",
                "\n",
                "You can instead start from scratch with random weights using the parameter `weights=None` when creating an architecture. For instance, let's create an Alexnet architecture with random weights:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "9f63385e-cd03-4cd8-9528-010a14280b08",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "my_arch = cnn_architectures.alexnet(\n",
                "    num_classes=len(classes), weights=None, num_channels=1\n",
                ")\n",
                "model = cnn.CNN(my_arch, classes, 2.0)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "47cb012b-aa3d-4056-93c5-da13a79ed5ab",
            "metadata": {},
            "source": [
                "### Other custom architectures\n",
                "\n",
                "We can create any Pytorch model architecture and pass it to the `architecture` argument when creating a model in OpenSoundscape. You can do this by\n",
                "* subclassing an existing Pytorch model\n",
                "* writing one from scratch. The minimum requirement is that it subclasses `torch.nn.Module` - it should at least have `.forward()` and `.backward()` methods.\n",
                "\n",
                "\n"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "c7e27573-744b-4071-bf7e-25411be10386",
            "metadata": {},
            "source": [
                "### Viewing the architecture\n",
                "The architecture is stored in the model object's `.network` attribute. We can view the network and access its parameters by examining this attribute and its sub-parameters. For instance, we can view a ResNet's feature layer using the `.fc` attribute:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "cd0d0636-bea4-42a4-9f79-a4ed340e30c5",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "Linear(in_features=512, out_features=1, bias=True)"
                        ]
                    },
                    "execution_count": 9,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "model = cnn.CNN(\"resnet18\", classes, 2.0)\n",
                "model.network.fc"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "00453342-0830-45eb-84ac-93cd1a3c7337",
            "metadata": {},
            "source": [
                "It is also possbile to replace an architecture of a model entirely simply by setting `model.architecture` to a new architecture, but this is not generally recommended unless you know what you're doing. It will completely remove anything the model has \"learned,\" since the learned weights are a part of the architecture."
            ]
        },
        {
            "cell_type": "markdown",
            "id": "fbd0d364",
            "metadata": {},
            "source": [
                "## Freezing the feature extractor\n",
                "\n",
                "Sometimes, we only wish to train the final layer or layers of a CNN, known as the \"classification head\" or simply \"classifier\", rather than training all of the layers. This technique makes it possible to fine-tune a pre-trained network using limited training data, without ruining the generalizability of the \"feature extractor\" (the term for all of the layers before the \"classification head\"). \n",
                "\n",
                "If you're using one of the built-in CNN architectures in OpenSoundscape, you can easily \"freeze\" the feature extractor (i.e., tell PyTorch not to update any of the weights during training of the classification head) with a one-liner, then proceed with training as normal (`cnn.train()...`)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "id": "3a7b6775",
            "metadata": {},
            "outputs": [],
            "source": [
                "model.freeze_feature_extractor()"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "a2fdf566",
            "metadata": {},
            "source": [
                "If you are using a custom architecture not native to OpenSoundscape, you can still freeze all but one layer with a one-liner. You just need to specify which layer or layers you wish to keep \"trainable\" or \"unfrozen\". In the case of a resnet architecture, we can point to the `.fc` (for \"fully connected\") layer as the classification layer we want to train while freezing all others. Note that different pytorch architectures may not call the classification layer `.fc`. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 11,
            "id": "c3eafbc9",
            "metadata": {},
            "outputs": [],
            "source": [
                "model.freeze_layers_except(model.network.fc)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "02e70d49-5bd1-4ab0-8221-3a0c45d04ce4",
            "metadata": {
                "tags": []
            },
            "source": [
                "### Single-target models\n",
                "\n",
                "One decision about your architecture is whether your classification problem is single-target (exactly one label per sample) or multi-target (any number of labels per sample, including 0). Single-target models have a softmax activation layer which forces the sum of all class scores to be 1.0.\n",
                "\n",
                "This is a separate decision from the number of classes your model can potentially identify. **For example, if you are creating a model to identify only one species, your model should contain only one class, but it should still be a multi-target model.** This allows your model to predict that the species isn't present (i.e. the class score can be 0).\n",
                "\n",
                "In most cases in bioacoustic monitoring, models are multi-target. But if you would like to train a single-target model, just set `single_target=True` either when creating the model object or afterwards."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 12,
            "id": "34ee2743-ccf9-49e4-9ee0-2e71bdb83f58",
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "Updating torchmetrics and loss_fn to match single_target=True\n"
                    ]
                }
            ],
            "source": [
                "# Change the model to be single_target\n",
                "model.single_target = True\n",
                "\n",
                "# Or specify single_target when you create the object\n",
                "model = cnn.CNN(\"resnet18\", classes, 2.0, single_target=True)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "8c804eb4-7c40-43ad-9575-fcca66bf0a3f",
            "metadata": {
                "tags": []
            },
            "source": [
                "### Multi-target training with ResampleLoss\n",
                "\n",
                "Training multi-target models is challenging and can benefit from using a modified loss function. OpenSoundscape provides a loss function designed for training multi-target models. We recommend using this loss function when training multi-target models. You can add it to a class with an in-place helper function:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 13,
            "id": "f1848284-fc30-4cba-96f7-88f6d6a5579c",
            "metadata": {},
            "outputs": [],
            "source": [
                "from opensoundscape.ml.cnn import use_resample_loss"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "66435b66-cf27-4829-9f9c-bc09e8fa734a",
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "ResampleLoss()\n"
                    ]
                }
            ],
            "source": [
                "model = cnn.CNN(\"resnet18\", classes, 2.0)\n",
                "use_resample_loss(model, train_df=train_df)\n",
                "print(model.loss_fn)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "4d6a1655-e8bd-4b84-a2db-604c5faf8a3c",
            "metadata": {},
            "source": [
                "## Spectrogram settings\n",
                "\n",
                "The parameters used to create spectrograms are very important for classifier performance. The main way you modify these parameters are by setting a custom preprocessor. \n",
                "\n",
                "OpenSoundscape also provides an additional option that can affect performance and training speed, the ability to change the size of the input spectrogram.\n",
                "\n",
                "### Custom preprocessing\n",
                "\n",
                "The [preprocessing tutorial](tutorials/preprocessors.html) gives in-depth descriptions of how to customize your preprocessing pipeline, as well as best practices for using these customizations, e.g. reviewing what the samples look like before training on them.\n",
                "\n",
                "Here, we'll just give a quick example of tweaking the preprocessing pipeline: providing the CNN with a bandpassed spectrogram object instead of the full frequency range. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "aa67c4f9-9273-4368-9864-cfae033dbcbc",
            "metadata": {},
            "outputs": [],
            "source": [
                "model = cnn.CNN(\"resnet18\", classes, 2.0)\n",
                "\n",
                "# change the min and max frequencies for the spectrogram bandpass action\n",
                "model.preprocessor.pipeline.bandpass.set(min_f=3000, max_f=5000)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "86e1980c-f24e-437e-b170-93983c065900",
            "metadata": {},
            "source": [
                "### Size of spectrogram\n",
                "\n",
                "OpenSoundscape enables you to modify the size of the spectrogram input to the classifier. \n",
                "\n",
                "Larger spectrograms have greater resolution which can help the classifier pick up on finer details.  However, potential accuracy improvements come at the cost of more resource-intensive training and prediction.\n",
                "\n",
                "To change the image size, when creating the CNN set `sample_shape = (height, width, channels)`. Most classifier architectures expect 3 channels."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "dead8446-c819-4486-8178-9991caa03ccc",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "(448, 448, 3)"
                        ]
                    },
                    "execution_count": 16,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "model = cnn.CNN(\"resnet18\", classes, 2.0, width=448, height=448, channels=3)\n",
                "p = model.preprocessor\n",
                "p.height, p.width, p.channels"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "230fce64-e2ae-4afa-bc60-b20f0a3a061b",
            "metadata": {},
            "source": [
                "## Learning parameters\n",
                "\n",
                "In a general sense, a model's **learning rate** determines how fast the model fits to the data. More specifically, it determines how much the model's weights change every time it calculates the loss function. \n",
                "\n",
                "Faster learning rates improve the speed of training and help the model leave local minima as it learns to classify, but if the learning rate is too fast, the model may not successfully fit the data or its fitting might be unstable.\n",
                "\n",
                "OpenSoundscape allows you to flexibly change parameters related to the model's optimizer. This includes parameters related to the learning rate, as well as the emphasis the model's training places on learning smaller, less complex weights, known as **regularization**.  \n",
                "\n",
                "First, let's look at the model optimization (AKA \"learning\") hyperparameters:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 17,
            "id": "82963049-9c15-4bc0-8e05-93541f9383e9",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "{'class': torch.optim.sgd.SGD,\n",
                            " 'kwargs': {'lr': 0.01, 'momentum': 0.9, 'weight_decay': 0.0005},\n",
                            " 'classifier_lr': None}"
                        ]
                    },
                    "execution_count": 17,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "model.optimizer_params"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "4b4129c3-6e9c-4a4a-bddb-e976a05bf880",
            "metadata": {},
            "source": [
                "Options for modifying the learning hyperparameters include:\n",
                "\n",
                "* Modify learning rate\n",
                "* Fine tune a model\n",
                "* Separate learning rates for feature and classifier blocks\n",
                "* Modify the learning rate schedule\n",
                "* Set the regularization weight decay \n",
                "\n",
                "### Modify learning rate\n",
                "\n",
                "A basic way to modify the learning rate on an entire model is to change the `lr` parameter:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "70d75753-45a3-412c-915e-69219f3829b6",
            "metadata": {},
            "outputs": [],
            "source": [
                "model.optimizer_params[\"kwargs\"][\"lr\"] = 0.01"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "e91b18c7-6f56-4014-9be6-c29835e6c426",
            "metadata": {},
            "source": [
                "### Fine tune a model\n",
                "One instance where we might want to modify a learning rate is to \"fine tune\" a model.\n",
                "\n",
                "After training a model for a while at a relatively high learning rate (think 0.01), we might want to \"fine tune\" the model, or set a lower learning rate, then train the model at the lower rate for a few epochs.\n",
                "\n",
                "Let's set a low learning rate for fine tuning:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "b8efed52-b531-43fd-affb-af1a7309b907",
            "metadata": {},
            "outputs": [],
            "source": [
                "model.optimizer_params[\"kwargs\"][\"lr\"] = 0.001"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "93c856a6-91a8-4909-81bc-0683f4681bf7",
            "metadata": {},
            "source": [
                "### Separate learning rates for feature and classifier blocks\n",
                "\n",
                "Convolutional Neural Networks can be thought of as having two parts: a **feature extractor** which learns how to represent/\"see\" the input data, and a **classifier** which takes those representations and transforms them into predictions about the class identity of each sample.\n",
                "\n",
                "In Pytorch, we can customize the learning rate of different layers. For convenience, OpenSoundscape provides an easy way to set the classifier layer's learning rate separately from the rest of the network. (For more advanced use cases, see the source code in BaseModule.configure_optimizers() for how to set different layers to different learning rates)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "ad7fbc84-13eb-4ace-8f47-546d71e53eab",
            "metadata": {},
            "outputs": [],
            "source": [
                "model = cnn.CNN(\"resnet18\", classes, 2.0)\n",
                "# set a high learning rate for the classifier\n",
                "model.optimizer_params[\"classifier_lr\"] = 0.02\n",
                "# set a low learning rate for the rest of the network\n",
                "model.optimizer_params[\"kwargs\"][\"lr\"] = 0.0001\n",
                "\n",
                "# these learning rates will be configured in the optpimizer when you begin training"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "b10994fc-03a2-460e-9011-dacc4a476b2b",
            "metadata": {},
            "source": [
                "### Learning rate schedule\n",
                "It's often helpful to decrease the learning rate over the course of training. By reducing the amount that the model's weights are updated as time goes on, this causes the learning to gradually switch from coarsely searching across possible weights to fine-tuning the weights.\n",
                "\n",
                "By default, the learning rates are multiplied by 0.7 (the learning rate \"cooling factor\") once every 10 epochs (the learning rate \"update interval\"). \n",
                "\n",
                "Let's modify that for a very fast training schedule, where we want to multiply the learning rates by 0.1 every epoch. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 21,
            "id": "6a570174-867a-4c19-a36c-fd8e59cbc020",
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "{'class': torch.optim.lr_scheduler.StepLR,\n",
                            " 'kwargs': {'step_size': 10, 'gamma': 0.7}}"
                        ]
                    },
                    "execution_count": 21,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "model.lr_scheduler_params"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "0fb7cdf7",
            "metadata": {},
            "outputs": [],
            "source": [
                "model.lr_scheduler_params[\"kwargs\"][\"step_size\"] = 1  # decrease lr more frequently\n",
                "model.lr_scheduler_params[\"kwargs\"][\"gamma\"] = 0.1  # decrease lr by 10x each time"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "b33d968c",
            "metadata": {},
            "source": [
                "#### Set the regularization weight decay\n",
                "Pytorch optimizers perform [L2 regularization](https://developers.google.com/machine-learning/crash-course/regularization-for-simplicity/l2-regularization), giving the optimizer an incentive for the model to have small weights rather than large weights. The goal of this regularization is to reduce overfitting to the training data by reducing the complexity of the model. \n",
                "\n",
                "Depending on how much emphasis you want to place on the L2 regularization, you can change the weight decay parameter. By default, it is 0.0005. The higher the value for the \"weight decay\" parameter, the more the model training algorithm prioritizes smaller weights."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "c8b974d0",
            "metadata": {},
            "outputs": [],
            "source": [
                "model.optimizer_params[\"kwargs\"][\"weight_decay\"] = 0.001"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "5184996f",
            "metadata": {},
            "source": [
                "### Other LR schedulers\n",
                "or, we can use a different class, such as Cosine Annealing LR"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "30f6105a",
            "metadata": {},
            "outputs": [],
            "source": [
                "# decrease lr over 10 epochs using a cosine annealing schedule\n",
                "model.lr_scheduler_params = {\n",
                "    \"class\": torch.optim.lr_scheduler.CosineAnnealingLR,\n",
                "    \"kwargs\": {\"T_max\": 10},\n",
                "}"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "c94557a0-76f6-41d0-b2aa-3e4a9c56ff41",
            "metadata": {},
            "source": [
                "## Experiment!\n",
                "\n",
                "In this tutorial we've covered the more advanced options available to customize your CNN training.\n",
                "\n",
                "While intuition can be a helpful guide, it's not always intuitive which parameters will result in the best model. This is why it's helpful to experiment with different parameters to see what works for you.\n",
                "\n",
                "To facilitate experimentation, OpenSoundscape includes integration with Weights & Biases. See the original [\"Train a CNN\" tutorial](tutorials/train_a_cnn.html) for more information on how to set this up.\n",
                "\n",
                "**Clean up**: Run the following code to remove the downloaded files."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "5963bca9-e8c6-4390-84ca-175b33914ced",
            "metadata": {},
            "outputs": [],
            "source": [
                "import shutil\n",
                "\n",
                "# uncomment to remove the labeled dataset\n",
                "# shutil.rmtree('./woodcock_labeled_data')\n",
                "Path(\"./my_pre.json\").unlink(missing_ok=True)\n",
                "\n",
                "for p in Path(\".\").glob(\"*.model\"):\n",
                "    p.unlink()"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "opso_dev",
            "language": "python",
            "name": "opso_dev"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.13.5"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 5
}