South Run

  Home                                        Project Coyote                                          Contact


This site describes a system for automatically detecting specific bird calls in remote locations, and then providing detection reports (sound clips) via the Internet within a couple of hours of their occurrence. The current training and implementation is for detecting double knocks of the Ivory-billed Woodpecker.

The value in getting detection reports in almost real-time is to potentially aid in locating an actively used roost cavity. Recent recordings of putative ivory-bill double knocks in the southeastern US seem to have a bias toward occurring in the early morning and late afternoon, presumably when the bird is near its roost. If a double knock is detected around sunset, and can be identified as such within a couple of hours, searchers may have time to get into that area before the bird emerges from its roost in the morning. An early morning detection may increase the likelihood of encountering a bird returning in the evening.

There are some assumptions being made here about the ivory-bill that are not well-supported in what’s currently known about the bird’s behavior. However, the assumptions are reasonable and the payoff could be great if they are true. Acoustically detecting birds near their roost holes may be the best method to locate birds in the future for study and survey. A quality video may prove again that the bird is extant, but an acoustics system that can alert scientists to an active roost in near real-time is what will be needed to take the next step.

Acoustic NodeThis system consists of two different types of deployed hardware. One is an acoustic node, which is a custom digital audio recorder integrated with a digital signal processor (DSP) to record and then process audio for detection of specific sounds. The other is a cell node, which uses a cellular modem to upload sound clips of detections onto a website via a wireless data network, such as that owned by T-Mobile. The two types of nodes communicate with each other using low-power data radios. A single cell node can serve a number of acoustic nodes simultaneously.

Nodes are designed to be hung from a branch high in a tree (20-50 feet) and away from the trunk. They can be deployed and serviced from the ground, so climbing is not necessary. The basic housing material is a length of 3” diameter PVC pipe for the acoustic node and 4” PVC or ABS for the cell node. The acoustic node uses alkaline battery packs, which are contained within the housing. Assuming operation of 3-4 recording hours per day (morning and evening), it is expected that an acoustic node will operate for up to one month between battery changes.


System Layout

Sample LayoutThe basic system layout includes a number of acoustic nodes to record and process sound, and a cell node that supports them in uploading detected audio clips to an FTP server. The sound detection range of an acoustic node depends on a lot of factors, but 500 meters is a reasonable starting point in a calm environment. Both the acoustic and cell nodes contain data radios to transfer sound clips between them. Radio range depends mostly on transmitter power and the nature of obscuration (if any) between the two radios. The radio currently designed into this system has been tested through intermittent, but dense, fir forest to 2 kilometers. A Southern hardwood forest in winter, especially when the radios are elevated, should be able to provide at least that and probably more. The radio specifications claim 6 kilometers range with line-of-sight. 

A sample system layout is shown in the graphic. In this case, the acoustic nodes are placed for combined coverage of more than 6 kilometers of river length, assuming a 500 meter detection range for the acoustic nodes. The supporting cell node is located roughly at the center of the acoustic array so it can communicate with all of the acoustic nodes simultaneously.


Technical Details

A-node Block Diagram

The acoustic node contains a real-time clock with a calendar, and can therefore be programmed to operate during periods of the user’s choosing, such as times relative to sunrise and sunset. Three main events occur sequentially in the node during an operating period: (1) Audio is digitized and written to the Compact Flash during the programmed recording period. (2) Once recording is complete, the audio is processed for targets in the DSP, and short sound clips of possible detections (with likelihood scores) are saved back onto the Compact Flash. (3) The three sound clips with the greatest likelihood scores – assuming they pass a minimum threshold – are transferred to an intermediate memory and then transmitted by the radio to the cell node.

These events occur sequentially rather than simultaneously for a couple of reasons. First, program code complexity is greatly reduced by keeping the recording and detection processes separate without the need to coordinate them in the DSP. Second, the primary factor in power consumption in the node is the processor clock speed of the DSP. Recording takes much more time and has much less computational requirements than the detection processing, so it is more efficient to separate the two processes and do the much longer recording step at a slower clock speed.

The acoustic node housing is made from a 14" length of 3" diameter PVC pipe. Inside are contained the acoustic node electronics, two alkaline battery packs, a microphone and cables.

C-Node Block Diagram

Like the acoustic node, the cell node has a real-time clock (RTC) that allows its components to be turned on and off at specific times. In this application, the data radio turns on at about the same time the acoustic node has finished recording, and waits to receive a radio message containing detected audio clips. The audio data is transferred from the radio through the microcontroller to a memory for temporary storage. Once all of the audio data is received, the radio is turned off, and the cellular transfer is initialized. Data is then transferred from the memory to the cellular modem and then onto the Internet.

The data radio is a very small module that plugs into the embedded electronics of its respective node. The radio module is slightly larger than a postage stamp, and operates from the same power supply as the other electronics. It transmits in the 900 MHz ISM band with a data rate of 9600 bits per second. This is a rather slow speed for transferring raw audio, but a necessary tradeoff at the moment for the benefit of range and power consumption. Small data radios are used extensively in military and commercial remote monitoring applications, so continuing improvements in radio features (size, range, power, etc.) can be expected in the future.

The cellular modem transfers the audio clips to a website acting as an FTP server, which then makes the audio files available via the Internet. The cellular data service is very similar to that used by a Blackberry or iPhone.  Whether coverage is available in a particular area, and how good it is, depends a lot on which cellular network is being used. Many remote areas have no coverage at all, and in that case, this application would not work without adding radio relays to transfer data into a covered area. T-Mobile is the current data service provider.


Double Knock Algorithm

The DSP is programmed with an algorithm designed to detect Campephilus-like double knocks in the recorded audio. Suspected ivory-bill double knocks recorded in the US and known Campephilus double knocks recorded in Latin America typically have their peak spectral power in the 500-1000 Hz range with knock spacing of between 50-120 milliseconds.

The current algorithm has four stages:

  1. The audio data is sampled in time over the recording period and mathematically transformed into the frequency domain so that particular frequencies in a time slice can be isolated.  Running statistics of the power in the target frequencies are computed and time slices whose power exceeds a threshold are tagged for further processing.

  2. Time slices detected in the first stage are examined for some characteristics of double knocks. A clustering of tagged time slices should occur, since a single slice is less than 10 ms in duration and each knock, including the echos, is significantly longer than that. Also, two peaks should occur, spaced between 50 and 120 ms.

  3. If a clip of audio survives rejection in the second stage, more detailed time-frequency measurements are made to generate features that are used as inputs to a neural network classifier. The neural net provides a two-class result: target and non-target. The target class is trained using samples from known, suspected and simulated double knocks. The non-target samples are other sounds that have strong spectral power in the 500-1000 Hz range. 

  4. An algorithm sensitive enough to detect faint double knocks is going to generate a number of false targets as well. For this reason, sounds must be validated by a person. Short sound clips of the three best candidates during each recording period are uploaded to a FTP server for human validation.

A notable aspect of creating an algorithm for extracting a particular sound in a variable, high-clutter environment, such as the outdoors, is how the algorithm will change and mature over time. Algorithm development, in this case, is much more art than science, and maturation will be a continuing process of trial and error as new conditions for both targets and clutter are encountered. The algorithm this system has today may be completely different from the one it uses a year from now.


Contact

Please contact me if you have any questions or comments about this system.

Mark Gahler,   mark at south-run.com                    


Back to Top

                                                                 © 2009-2010  South Run LLC