ESTIMATING AZIMUTH AND ELEVATION FROM
INTERAURAL DIFFERENCES
Keith D. Martin
Perceptual Computing Section
MIT Media Lab, E15-401
Cambridge, MA 02139
***@*****.***.***
ABSTRACT localization of sources in the presence of re ections [5].
Modeling of human auditory localization has largely been limited to
1.3. Goal
lateralization, or left-to-right position. This paper describes an at-
tempt to tackle the more complicated problem of position estimation
In this paper, we will describe a system that addresses these points.
with two degrees of freedom (azimuth and elevation). Differences in
The goal is to produce a model that calculates a set of interaural
interaural intensity and arrival time are extracted from the acoustic
cues and infers a position on the two-dimensional surface of a sphere
signals at the left and right eardrums, and an estimate of position
directly. Further, the model should be able to exhibit human-like
is formed which is optimal for certain classes of source signals.
robustness in ambient environments; thus, we include a mechanism
Examples of spatial likelihood maps generated by the model are
corresponding to the precedence effect.
given and the types of errors made by the model are quanti ed. It
is suggested that such a model may work well in conjunction with
a spectral cue model like the one suggested by Zakarauskas and 2. FORM OF THE MODEL
Cynader (J. Acoust. Soc. Amer., Vol. 94, 1993, pp. 1323-1331).
For purposes of this paper, we de ne the following notations for
interaural differences. We shall refer to the interaural intensity
1. INTRODUCTION difference (IID), which is the difference (in dB) between the signal
levels at the two ears, the interaural phase delay (IPD), which is the
1.1. HRTFs and Eardrum Recordings time delay between the ne structure of the signals at the two ears,
and the interaural envelope delay (IED), which is roughly equivalent
It is generally accepted that the cues used for localization are em-
to the difference in group delay of the signals at the two ears.
bodied in the free- eld to eardrum, or head-related, transfer function
(HRTF). The HRTF includes the high frequency shadowing due to the
The model described in this paper can be broken into several layers,
presence of the head and torso, as well as the directional-dependent
as shown in gure 2. In describing the model, the layers are grouped
spectral variations imparted by the diffraction of sound waves by the
into two sections: (1) the model s front end, which transforms
pinna.
the acoustic signals at the two eardrums into measures of interaural
differences, and (2) a statistical estimator which determines the For free- eld sound sources more than a few feet away, the acoustic
position which is most likely to have given rise to the measured
wave front reaching the head may be approximated by a plane wave.
interaural differences.
To the degree that this approximation is valid, interaural differences
do not vary perceptibly with distance. Therefore, source distance is
The structure of the model is conceptually simple. At the input,
ignored in this paper, and HRTFs are assumed to be constant with
the eardrum signals are passed through identical lter banks, which
respect to distance.
are intended to model the time-frequency analysis performed by the
cochlea. Envelopes are estimated for each channel by squaring and
1.2. Interaural Differences, Localization smoothing the lter outputs. The envelope signals pass through on-
and the Precedence Effect set detectors, which note the time and relative intensity of energy
peaks in each signal. This information is used by the interaural
Interaural differences are often cited as the most signi cant cues for
lateralization (left-to-right position), and spectral cues based on fea-
tures of the HRTF are given credit for humans ability to perform
vertical localization tasks [1]. Zakarauskas and Cynader describe a above
localization model based on spectral features [2]. Interaural differ-
ences are largely dismissed as cues for vertical localization, possi-
bly because most studies of vertical localization are conducted with
source locations on the median plane, where interaural differences
are minimized, although Searle et al. have described a localization
right
model based on interaural differences for sources on or near the
median plane [3], and Lim and Duda have shown that interaural in-
front
tensity differences are viable vertical localization cues for sources
away from the median plane [4]. These models estimate azimuth
Figure 1: The coordinate system used in this paper: azimuth
and elevation independently rather than in combination. Also, they
; ;
( 180