Whistles, songs, boings, and biotwangs: Recognizing whale vocalizations with AI

In order to protect animals that live in remote environments, researchers must be able to find them to understand the movements of their populations over time. As long-term passive acoustic monitoring capabilities have grown more technologically sophisticated, automatic animal species identification tools built on large datasets from these recorded soundscapes have become an increasingly vital tool for conservation and ecological research. While models such as Google Perch have emerged that can classify thousands of bird vocalizations, similar models that can classify vocalizations from several whale species at once have proven more challenging to develop.

The acoustic range of whale species is incredibly broad, ranging from as low as 10 Hz for blue whales to above 120kHz for odontocetes (toothed whales), and recordings also vary dramatically by location and with time, which can make model development difficult. Additionally, researchers often don’t know what types of vocalizations are made by some especially elusive whale species, which complicates identifying those animals in the soundscapes. This is illustrated in the decades-old mystery surrounding a sound, called a “Biotwang”, that was first recorded almost a decade ago in the depths of the Mariana Trench. The sound has a "metallic" or "chime-like" quality, quite unlike the tonal moans more typical of whale vocalizations. In a recent paper, our collaborators at the U.S. National Oceanic and Atmospheric Administration (NOAA) determined that the Biotwang sound is uniquely produced by the elusive Bryde’s whales (pronounced "broodus").

Today we are delighted to share Google’s latest whale bioacoustics model, which can identify eight distinct species, as well as multiple calls for two of those species. Following on our collaborator’s discovery connecting Biotwangs to the Bryde’s whale and in the same paper, we expanded the model to include Biotwangs and used it to label more than 200,000 hours of underwater recordings. Here we describe the model and discuss some of the new insights into the ecology of whale species it is helping researchers to unlock. The model is now available for download via Kaggle Models.

Project background

Google Research’s journey with whale vocalization classification started in 2018 when we developed a novel classification model for detecting humpback whales in partnership with the Pacific Islands Fisheries Science Center (PIFSC) of NOAA. The model was used to identify humpback calls from over 187,000 hours of audio collected by NOAA, confirming spatio-temporal patterns of humpback songs and uncovering a new location at Kingman Reef where humpback songs had not been previously observed. We made further “splashes” with this model in collaboration with Google Creative Lab when we released Pattern Radio, an interactive visualization of a full year of underwater audio collected near Hawaii, labeled by the model, and peppered with additional expert insights on sections of the data. We released our humpback model publicly following Google’s AI Principles to understand and minimize the potential for misuse of the model.

These efforts led to a partnership with the Department of Fisheries and Oceans, Canada (DFO), especially with their Marine Mammal Response Program, whose operations in the Salish Sea include stewardship of the critically endangered Southern Resident Killer Whale population. Together, we published an orca (killer whale) detection model, which DFO also deployed in their hydrophone monitoring network, enabling real-time alerts.

A new whale bioacoustics model

We developed our new multi-species whale model to score and classify underwater audio for eight distinct species. Two of the species are further broken down by vocalization type, yielding a total of twelve classes. The model is multi-label, so scores are independent and not restricted to the top class or classes.

The following is the list of species for which the model can provide classification scores:

Audio examples of each of the species are in this repository.

Model development

The first step in our model is to convert the raw audio data into images called spectrograms representing each 5-second window of sound. The “front-end” of the model uses a mel-scaled frequency axis, log amplitude compression, and normalizes by subtracting the 5%-ile log amplitude in each frequency bin. The model then classifies these images as any of the 12 classes of whale species or vocalization.

Because long-term passive acoustic monitoring requires not only correctly classifying species but also correctly rejecting background and non-animal sound events, we did not restrict training to only positive labels. We sampled negative and background data extensively from the recordings provided by our partners. For model validation, we randomly selected a uniform subset of 20% of the available training data as the test set.

The model performance on the test set is described in the chart below. Overall, the model has good discriminative performance for each class. The classes for Minke, NARW, NPRW, and Bryde's have values close to one on all three metrics, indicating high model performance that requires a less severe tradeoff between false positives and false negatives. That tradeoff is more prominent for orca echolocation and whistles.

Model Performance on the test set by species. A high value of the area under the receiver operator curve, or AUC (ROC), indicates the model is able to discriminate well between positives and negatives. A Sensitivity @ 0.99 is the fraction of actual positives scoring above the threshold that rejects 99% of the true negatives. Finally, a Precision @ 0.5 (short for Precision @ recall 0.5) is the fraction of positive predictions that are correct at a reasonably sensitive threshold (below 50% of the true positives).

More details about the model and example code for applying to audio data are available on Kaggle Models.

New labels in the model

While there are many distinct and fascinating whale sounds included in our data, we highlight a few specific species and some of their unique sounds included in the model. We also highlight how we generated labels for some of these species and sounds even when the training data provided by our partners did not have these sounds labeled.

Minke whale “boing”

A mystery older than that of Biotwangs surrounds a different “metallic” sound that dates back to submarine recordings from the 1950s. It wasn’t until 2005 that NOAA scientists attributed this specific noise to minke whales (Balaenoptera acutorostrata). Our initial set of labels from PIFSC did not include this vocalization, called a “boing,” but while developing the original Google humpback model, we noticed it as an error mode in the first attempt. We managed to expand on those "found" labels enough to include minke as a class in the multispecies model.

Audio and spectrogram of a minke whale "boing".

North Pacific Right Whale upcalls and “gunshot” calls

The North Pacific Right Whale population is the only known population of right whales to “sing”. The eastern population of these whales is estimated to have only 30–35 individuals. While an “upcall” could come from a right whale, bowhead, or even a humpback, the North Pacific Right Whales can be distinguished by their unique “gunshot” call.

Audio and spectrogram of the upcall of a north Pacific right whale.

Audio and spectrogram of a series of north Pacific right whale gunshot calls.

Blue and fin whales

PIFSC had annotated subsets of their data for the presence of blue and fin whales prior to our initial collaboration on the humpback model. These species are present not only around the Hawaiian Islands but also in the offshore waters of all the world's major oceans.

For this work, their presence was particularly notable in the subset from the MARS hydrophone, operated by Monterey Bay Aquarium Research Institute. However, we did not have ground truth labels for the MARS data, so we trained a blue-and-fin-specific model on PIFSC data only and applied it to create pseudo-labels for the MBARI data.

Audio and spectrogram of a central Pacific blue whale call.

Audio and spectrogram of a fin whale call.

New insights

Bryde’s whales are baleen whales in the same genus as blue and fin whales. While sightings of these animals have been reported around the world, relatively little is known about their movements or population structure. Recordings collected in the Mariana Trench in 2014 and 2015 captured a unique vocalization called the “Biotwang”. This complex, 5-part call lasts approximately 3.5 seconds, starting with a low-frequency downsweeping moan from approximately 44 Hz to 30 Hz and followed by a “metallic” sound that goes up to 8000 Hz. Because the vocalizations were not associated with sightings of the animals, the researchers originally attributed them to an undetermined baleen species.

Photo of a Bryde’s whale spotted during a NOAA cetacean survey in the Mariana Archipelago in 2010 (Credit: NOAA Fisheries/Adam Ü, NMFS MMPA-ESA Permit #14097).

Subsequently, NOAA researchers were able to attribute Biotwangs to Bryde’s whales by aligning visual observations and acoustic data captured by sonobuoys. In the new paper, they report the true identity of these long-mysterious, twangy calls for the first time. That positive identification enabled us to improve our multi-species whale model by labeling the Biotwangs as Bryde’s whale signatures in the training data. When applied to the collection of long-term passive acoustic datasets, this led to the discovery of many instances of this call in the western North Pacific Ocean, revealing potential population differences between the central and western Pacific Bryde’s whales and uncovering a seasonality to these whales’ migration patterns.

Audio and spectrogram of the Biotwang call of a Bryde's whale. The highest intensity downsweep is at low frequencies, while the additional lines directly above are harmonics that aren't always present. The part that looks like quotation marks is the higher frequency “metallic” sound.

Extending the model to additional whale species and specific sounds

While our training data covered only eight of ~94 cetacean species, the pre-classifier activations generalize better than the ones from our prior (humpback model). This is due to more target classes (12 versus 1), increased variation in audio input from combining diverse datasets, and the inclusion of some species-agnostic target classes like "echolocation" and "call."

The model can be called in isolation via the TensorFlow SavedModel API, but the generalization use case above finds extra support from our open-source bioacoustics tools for efficient active learning and agile modeling, found on our Google Perch GitHub repository. So not only can we use this model to find species and vocalizations this model was trained on, but you can use the pre-trained embeddings from this model to search for, identify, and quickly create a classifier for new sounds or species of whales.

Acknowledgements

This work was done by Matt Harvey, Lauren Harrell, Julie Cattiau, Tom Denton, and Mikko Ilmonen.

We would also like to thank our external partners: Ann Allen (NOAA Fisheries Pacific Islands); Carrie Wall (University of Colorado, Boulder and NOAA NCEI); Paul Cottrell, James Pilkington, Miguel Neves dos Reis (Department of Fisheries and Oceans, Canada); Harald Yurk (Simon Fraser University); John Ryan and Danelle Cline (Monterey Bay Aquarium Research Institute); Catherine Berchok (NOAA Alaska Fisheries Science Center); Daniel Woodrich (NOAA Alaska Fisheries Science Center and University of Washington Cooperative Institute for Climate, Ocean and Ecosystem Studies); Marc Lammers, Anke Kügler, and Eden Zang (NOAA Hawaiian Islands Humpback Whale National Marine Sanctuary); Genevieve Davis and Sofie Van Parijs (NOAA Northeast Fisheries Science Center); Nicole Pegg (NOAA NEFSC/Florida Atlantic University).

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Whistles, songs, boings, and biotwangs: Recognizing whale vocalizations with AI

Quick links

Project background

A new whale bioacoustics model

Model development

New labels in the model

Minke whale “boing”

North Pacific Right Whale upcalls and “gunshot” calls

Blue and fin whales

New insights

Extending the model to additional whale species and specific sounds

Acknowledgements

Quick links

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Whistles, songs, boings, and biotwangs: Recognizing whale vocalizations with AI

Quick links

Project background

A new whale bioacoustics model

Model development

New labels in the model

Minke whale “boing”

North Pacific Right Whale upcalls and “gunshot” calls

Blue and fin whales

New insights

Extending the model to additional whale species and specific sounds

Acknowledgements

Quick links

Other posts of interest