Model Position

Location:

Cambridge, MA

Posted:

January 24, 2013

Contact this candidate

Resume:

ESTIMATING AZIMUTH AND ELEVATION FROM

INTERAURAL DIFFERENCES

Keith D. Martin

Perceptual Computing Section

MIT Media Lab, E15-401

Cambridge, MA 02139

***@*****.***.***

ABSTRACT localization of sources in the presence of re ections [5].

Modeling of human auditory localization has largely been limited to

1.3. Goal

lateralization, or left-to-right position. This paper describes an at-

tempt to tackle the more complicated problem of position estimation

In this paper, we will describe a system that addresses these points.

with two degrees of freedom (azimuth and elevation). Differences in

The goal is to produce a model that calculates a set of interaural

interaural intensity and arrival time are extracted from the acoustic

cues and infers a position on the two-dimensional surface of a sphere

signals at the left and right eardrums, and an estimate of position

directly. Further, the model should be able to exhibit human-like

is formed which is optimal for certain classes of source signals.

robustness in ambient environments; thus, we include a mechanism

Examples of spatial likelihood maps generated by the model are

corresponding to the precedence effect.

given and the types of errors made by the model are quanti ed. It

is suggested that such a model may work well in conjunction with

a spectral cue model like the one suggested by Zakarauskas and 2. FORM OF THE MODEL

Cynader (J. Acoust. Soc. Amer., Vol. 94, 1993, pp. 1323-1331).

For purposes of this paper, we de ne the following notations for

interaural differences. We shall refer to the interaural intensity

1. INTRODUCTION difference (IID), which is the difference (in dB) between the signal

levels at the two ears, the interaural phase delay (IPD), which is the

1.1. HRTFs and Eardrum Recordings time delay between the ne structure of the signals at the two ears,

and the interaural envelope delay (IED), which is roughly equivalent

It is generally accepted that the cues used for localization are em-

to the difference in group delay of the signals at the two ears.

bodied in the free- eld to eardrum, or head-related, transfer function

(HRTF). The HRTF includes the high frequency shadowing due to the

The model described in this paper can be broken into several layers,

presence of the head and torso, as well as the directional-dependent

as shown in gure 2. In describing the model, the layers are grouped

spectral variations imparted by the diffraction of sound waves by the

into two sections: (1) the model s front end, which transforms

pinna.

the acoustic signals at the two eardrums into measures of interaural

differences, and (2) a statistical estimator which determines the For free- eld sound sources more than a few feet away, the acoustic

position which is most likely to have given rise to the measured

wave front reaching the head may be approximated by a plane wave.

interaural differences.

To the degree that this approximation is valid, interaural differences

do not vary perceptibly with distance. Therefore, source distance is

The structure of the model is conceptually simple. At the input,

ignored in this paper, and HRTFs are assumed to be constant with

the eardrum signals are passed through identical lter banks, which

respect to distance.

are intended to model the time-frequency analysis performed by the

cochlea. Envelopes are estimated for each channel by squaring and

1.2. Interaural Differences, Localization smoothing the lter outputs. The envelope signals pass through on-

and the Precedence Effect set detectors, which note the time and relative intensity of energy

peaks in each signal. This information is used by the interaural

Interaural differences are often cited as the most signi cant cues for

lateralization (left-to-right position), and spectral cues based on fea-

tures of the HRTF are given credit for humans ability to perform

vertical localization tasks [1]. Zakarauskas and Cynader describe a above

localization model based on spectral features [2]. Interaural differ-

ences are largely dismissed as cues for vertical localization, possi-

bly because most studies of vertical localization are conducted with

source locations on the median plane, where interaural differences

are minimized, although Searle et al. have described a localization

right

model based on interaural differences for sources on or near the

median plane [3], and Lim and Duda have shown that interaural in-

front

tensity differences are viable vertical localization cues for sources

away from the median plane [4]. These models estimate azimuth

Figure 1: The coordinate system used in this paper: azimuth

and elevation independently rather than in combination. Also, they

; ;

( 180

Contact this candidate