Phased spatial microphone arrays
Kragen Sitaker
kragen at pobox.com
Thu May 5 03:37:02 EDT 2005
(Haven't fact-checked this or checked for existing versions.)
Sometimes, you're most interested in the accuracy with which you
measure a single signal. But sometimes, you're more interested in the
number of signals you can acquire and process.
Recording audio that sounds good is hard. The usual approach is to
create a very quiet environment for recording in, but that costs a lot
of money and creates a nexus of control. Part of the problem is that
it's very difficult to distinguish the sound you want to record from
its own echoes and from other background noise. Another part is that
good microphones are expensive.
Having more spatially-separated channels available can mitigate this
problem, but many good microphones costs even more than a single good
microphone, and I suspect that you need a lot of microphones to do a
really good job. I wonder if enough channels could mitigate the
problem of each channel being marginal.
You can make a really marginal microphone from two sheets of aluminum
foil separated by some graphite powder, which conducts better when
it's under pressure. Suppose we want to do better than this and make
ten thousand really marginal microphones and use them all at once?
Well, you could make them in a 100x100 array on, say, a piece of
polyester film. You make a 100x100 array of circles of aluminum foil,
with 50 little aluminum-foil traces running between each pair of edge
circles out to the edge of the film. You have another piece of
polyester film with holes where all the circles are, and you glue it
on top to hide the traces. Then you pour graphite powder on top, so
it gets on all the circles, and put a sheet of aluminum foil over the
whole thing and tape the edges shut.
Now you take a microcontroller with ten thousand I/O pins and connect
them to each of these circles. You ground the sheet of foil and use
the I/O pins to measure the resistance to ground at each of these
circles frequently, to maybe 1-2 bits of precision. Now you have ten
thousand channels of spatially separated audio input, each of which
has a different pressure threshold for switching between" "1" and "0",
due to process variation. I suspect you could calibrate this
microphone to get some sort of reasonable quality of sound out of it
--- maybe not 16 bits of quality, but at least ISDN quality, which
approximates 12 bits with mu-law encoding.
And you have spatial diversity, which gives you the opportunity to
localize the signals you want to record (like your own voice) and
screen out those you don't.
(I think you actually want to position the microphones somewhat
randomly, as in Berkeley's Allen telescope array, until they're
occupying most of the space.)
Of course you can't have ten thousand I/O pins on a chip package, but
you really do need ten thousand wires from the ten thousand
microphones. You can't do row-column scanning in the usual way ---
consider when two of the rows are shorted to the same column. Now you
have two indistinguishable rows, and a lot of signals will be in the
wrong spatial location. Since the microphones are resistive and
varying, every row will be coupled to every column, and thence to
every other row. You need at least a transistor to select the signal
from each microphone (perhaps with row-column encoding). You could
call this an active-matrix microphone. Ideally you could fabricate
the ten thousand transistors on the paper out of polymer
semiconductors.
(What are the limits? Maybe you could have 100 or 300 or 1200
microphones per inch, and have several megachannels of input.)
The trend for things like this lately has been to try to put
everything on a tiny (DLP or sensor) chip and use optics or something
to get the data out into the larger physical world. So if you had
some panel that changed color or reflectivity in response to pressure,
you could just take a continuous video of the panel. 60 frames per
second doesn't sound like a lot for audio, but that's 17 ms per frame
--- if your panel is 17 feet or larger in size along the normal to the
wave, you don't miss anything, and you might be able to abuse Fourier
analysis to fill in the gaps otherwise. Shining a light on a Mylar
emergency blanket seems like it might work, although it gives you kind
of a high-pass filter, because it shows you the point-to-point
differences, not the point-by-point values.
I suppose I should get a clue about the actual systems in the real
world that already do something like this, with many fewer microphones
--- Howard Dean's famous scream-making noise-canceling microphone, and
the ten-year-old Polycom full-duplex speakerphones are the examples
that come to mind.
More information about the Kragen-tol
mailing list