Why virtual mics matter — acoustic spatial resolution in a 3D space

Written by Erica Yorga | Feb 10, 2021 1:00:00 PM

This blog is part of a series that takes a deep dive into the science behind Microphone Mist™ technology. The series also includes Continuous echo cancellation. This is what clarity sounds like, A groundbreaking approach to gain control that cuts conference call annoyance and Unified coverage map: a radically better technology for hybrid spaces. Each piece originally appeared in audioXpress magazine.

Thousands of virtual microphones. It’s what Microphone Mist technology offers, and it sounds impressive. But what does it really mean? What impact do thousands of virtual microphones really have on your daily conference calls?

Thousands of virtual microphones add up to more than just extra pickup points in the room. They represent a conceptual change to the status quo of physical microphones and beamformer systems. They alter our basic understanding of acoustic pickup in conference rooms and other meeting spaces.

Using an approach that focuses on discrete three-dimensional locations and does not rely on broad coverage zones, we can deliver optimized analysis and solutions that result in audio conferencing systems that are more precise and capable. Put another way, if we move from using broad coverage zones that aim in the direction of a desired sound source and instead analyze sound at many thousands of spatial points (locations) in a room, we have the opportunity for a reset. A virtual microphone is any one of the thousands of predetermined locations that is a spatial point in three-dimensional space that the array can specifically focus on.

By applying a new approach that analyzes a room three dimensionally, audio characteristics can be determined for each one of thousands of locations in a room. That wealth of sound information can then be independently evaluated for desirability and quality, and automatic adjustments to the system can be made accordingly.

The audio conferencing status quo

There are two classical approaches to audio conferencing microphone pickup.

One approach uses one or more discrete microphones (see Figure 1). Systems based on this approach may range from simple, such as a lapel mic attached to the person in the room giving a presentation or lecture, to complex, such as a collection of distributed gooseneck microphones on a table or pendant-style microphones suspended from the ceiling.

This classical approach aims to provide specific coverage to a small target zone and/or broad coverage to a larger zone or zones defined by the polar pattern of the microphones. Although audio pickup performance can be good when talkers are close to the microphone, it degrades significantly when talkers move away from the microphones or change their orientation to them. Moving talkers can be especially hard to pick up, because when they stand to walk and talk, they will most likely leave the coverage zone of the tabletop microphones. The talkers are constrained to the limited coverage of the microphone system. This approach may not be suitable for meeting situations that are dynamic and that may require talkers to move to a different location, such as a presentation device, to illustrate a point.

If multiple sound sources are colocated within the coverage zone of any one microphone, the system cannot discriminate between them. That may not be a problem if all those sound sources are desired, but it will be a problem if one or more of the sound sources is a noise that is not desired, such as an HVAC system or the sound of somebody typing on a keyboard or shuffling papers. The microphone will pick up all the sounds in that coverage zone and postprocessing will be required to deal with the unwanted noise.

Tabletop microphone example

Figure 1: Tabletop microphone example

The other common approach to audio conferencing uses beamforming arrays. These arrays, composed of multiple physical microphones configured to work as a beamforming array, use preconfigured coverage zones based on expected room use (see Figure 2). The objective of the array is to amplify the sounds within the target zone with higher gain than is typical in a system that uses discrete mics and attenuate sounds that are outside the target zone so that reverberation and undesired noises are significantly reduced. These systems are called beamformers because each zone always begins at the location of the microphone array and extends outward, widening from there and typically extending to a room boundary.

A beamforming array may have just one zone (in the case of a beam-tracking array) or as many as a dozen or more zones configured to cover the room based on expected use (in the case of a multizone array). Because the beams can be narrowed, the system can reduce unwanted noise sources by targeting specific coverage zones and excluding others. However, even a narrowed beam does not provide enough spatial granularity to provide high-spatial resolution. That is, a beam deals with sound based on direction, not location. If an undesired noise source is within the beam, the beamforming system has the same limitations that a system based on discrete microphones has. Undesirable noise sources still need to be dealt with by postprocessing using specialized noise filters and algorithms. If talkers are located outside the beam, such as at a display device, or are moving throughout the room, there is no way for them to know whether they are within a beamformer zone or not. With a discrete microphone system, the talkers can see the location of the physical mics. However, with a beamformer the talkers cannot see where the beamformer is pointing or is configured to point, so they may inadvertently leave the preconfigured coverage zones.

Figure 2: Beamforming array example

Both of these classical approaches are based on preconfigured broad coverage zones and within those zones identify the loudest sound sources in a room, which may or may not be the talker, and attempt to optimize the audio performance. In each of the classical approaches, the full-room acoustic coverage may not be adequate because acoustic spatial resolution and density is low and continuous acoustic monitoring of the full room in its entirety is usually not feasible. Also, each of the classical approaches is based on the expected use of the room. If actual use of the room deviates from the expected use, the system typically has to be reconfigured. Obtaining detailed information about the acoustic properties and precise location of all sound sources in a room is difficult with broad coverage approaches.

In contrast, Nureva has developed a unique and innovative approach that covers the whole area by analyzing the three-dimensional space in its entirety with a high degree of spatial resolution. That allows the microphone system to obtain precise acoustical information at thousands of individual locations. Since the measured spatial granularity is very high, all acoustic sound sources regardless of location can be identified and managed concurrently. As a result, the system can provide a full-room acoustic perspective that optimizes the audio pickup performance to a level that is simply not feasible in lower spatial resolution systems.

A novel approach to acoustic spatial resolution

Nureva realized that to improve the audio experience in meeting spaces, a new approach was needed. Our objective? Obtain better, more precise acoustical information by measuring sound characteristics in many discrete locations in the whole of an acoustic space — rather than simply optimizing an active sound source anticipated to come from one or more directions. We needed a new and innovative method to acoustically resolve a space to a finer degree than other audio conferencing systems.

To achieve this, we needed to look at a room as a three-dimensional acoustic space (see Figure 3). Nureva has developed patented microphone array technology (called Microphone Mist technology) that can resolve an acoustic space in three dimensions, using thousands of evenly distributed virtual microphones. The result? A full-coverage grid that provides precise acoustic information at a higher resolution and density coverage than can be achieved using the classical approaches.

Figure 3: Microphone Mist technology

The concept of high-spatial resolution is helpful in understanding what makes Nureva’s approach to the acoustic space so unique. Though it’s a new approach to audio conferencing, it is similar to high-resolution technologies in other fields.

Do we need higher resolution?

When you buy a TV or camera, image resolution is one of the performance specs we know is important. A 4K picture is better than a picture rendered in 1080p. Higher resolution produces a better experience. It’s the same with other formats and technologies — colors, music, images and even telescopes. Higher resolution means more higher quality information and processing that allows a better experience.

For example, Figure 4 captures the effect of bit depth on the color palette. When an image has a bit depth of 2, we get four color options. As we increase to 14 bits of resolution, we get 16,384 color options. As the bit depth increases, so does resolution of the color palette within the image, meaning that finer color details can be resolved. It becomes clear that the use of a higher bit depth description of color is preferable for displaying and analyzing images.

Figure 4: Color bit depth

In Figure 5, the low-resolution 4 dpi image of the guitar is blocky compared to a high-resolution 300 dpi image. Increasing the resolution means a more precise and detailed image can be displayed. Details that are obscured and blurry in the low-resolution image become very evident with a higher resolution image. We can perform better processing and analysis on the image data.

Figure 5: Image resolution

The same thing applies to music. In Figure 6, the table shows that a 4-bit recording has less resolution than one done in 24 bits. With a higher bit depth, we gain an improvement in signal to noise, higher dynamic range, less quantization error and so on.

Figure 6: Music resolution

The advantages of using greater bit depth and the resulting increase in data resolution are well understood and shared across digital platforms. These same benefits also apply to audio microphone systems that subdivide an acoustic space into finer and finer granular detail. If we resolve a space to smaller and smaller acoustic zones, we can describe and understand the sound sources and space better, resulting in optimized processing of sound sources based on their own individual acoustic characteristics.

Resolution in the acoustic space

When it comes to audio conferencing, it is useful to divide a space into acoustic regions to help provide sufficient coverage. The room can be divided into physical microphone coverage zones or virtual microphone locations. In the case of discrete microphones and beamformers, each zone starts from the center of the aperture of the mic system and is defined by the shape of the polar plot for each configured zone. In the case of Microphone Mist technology, each region is centered on the location of each of the virtual microphones in a three-dimensional space. Each location can be assigned an individual spatial resolution value. The total resolution value indicates how many points of spatial granularity the microphone system is able to resolve in the three-dimensional space.

Figures 7a and 7b show a single discrete microphone and distributed discrete microphones in a space. If we assign a spatial zone value to each coverage zone, which in the case of a discrete mic starts at the center of the physical mic, we can see that a single mic has one individual spatial zone and a total of one spatial zone in the space (Figure 7a). In the case of multiple discrete microphones, three spatial zones are shown for a total spatial resolution value of 3 (Figure 7b). Although the coverage zones are large, the microphone systems are not able to discriminate individual sound sources within a single zone. This makes for a low total spatial resolution value. Just as with color and images, the more we can divide up the acoustic space, the more spatial resolution we have in measuring and describing it.

Figure 7a: Discrete mic spatial resolution example

Figure 7b: Discrete mic spatial resolution example

Figures 8a and 8b show a generalized beamformer coverage pattern. The same type of quantification can be applied by assigning an individual spatial value to each zone. Figure 8a illustrates a three-zone system that results in a spatial resolution value of 3. Even in the most complex implementations, coverage zones are usually limited to less than a few dozen and in this case six zones (Figure 8b), with a spatial resolution of 6. It’s clear that beamformers produce a small total spatial resolution value to describe the whole space.

Figure 8a: Beamformer spatial resolution example

Figure 8b: Beamformer spatial resolution example

Both systems are able to find the sound source in each coverage zone the same way a 4 dpi resolution image can display broad colors as fuzzy blobs. But discriminating sound sources within any one zone is problematic. And if a sound source is outside the coverage zone, it’s ignored entirely. The inability of either classical approach to divide the space into a high-density acoustic grid limits how the microphone system can identify sound sources and their properties.

So, do we want a blurred acoustic picture with lower spatial resolution? Or can we enjoy the technology benefits of high-resolution information and data in an acoustic space?

Why thousands of virtual microphones matter

Dividing a space into smaller areas results in more precise focus. It’s the only way to really understand what’s going on in a space. The ideal scenario is to create thousands of evenly spaced locations — which is exactly what Microphone Mist technology does.

Figure 9 shows a system with thousands of individual virtual microphone locations which, in the case of Microphone Mist technology, results in a total spatial resolution of 8,192. This is because, as in other approaches, each location is assigned a spatial resolution value. Microphone Mist technology creates thousands of concurrent virtual microphone zones. When you think about the benefits of resolution in colors and images, it is clear that greater resolution is critical for high precision and detailed collection of sound information. In the acoustic realm, Microphone Mist technology can divide a room into very finely spaced acoustic locations.

Figure 9: High-spatial-resolution coverage using thousands of virtual microphone locations

By dividing the acoustic space into a much higher resolution spatial three-dimensional grid, our system can monitor and analyze each location based on its own characteristics. This means that at each virtual microphone location, our system measures, analyzes, targets, processes and reports about that location to develop a comprehensive soundscape presentation of the entire space.

Because the system is continuously gathering acoustic information about each location, it can resolve individual sound sources in relative three-dimensional space within the whole coverage area. It’s not limited to optimizing a single sound source and ignoring what else is going on in the rest of the room. When a person is speaking and typing, Microphone Mist technology has the spatial resolution to focus on the location close to the talker’s mouth and de-emphasize the location of the keyboard. The result is that the system can discriminate between the locations of desirable and undesirable sounds. Talkers at the boundary of the room, such as those standing at a display device, do not have to worry if they are in the coverage zone or not. If talkers stand to walk, they can continue talking without considering how the microphone system is configured. As talkers walk, they are transitioned to each individual virtual microphone in real time through small seamless transitions because the room is covered with evenly distributed virtual microphones. This ensures consistent high-quality microphone pickup performance regardless of where talkers walk in the room. Talkers can sit, bend and in effect move naturally in the room without fear of dropping or fading out, which means natural speech, gestures and movements.

When it comes to other sources of unwanted noise, such as HVAC, the system can treat them uniquely with our Intelligent Sound Targeting technology. This technology targets and focuses on the specific location of a sound source in the room, even in complex acoustic environments with multiple sound sources. Each active sound source is referenced to a location in a three-dimensional space, and proprietary logic-based processing decisions can be made on how and when to focus the system on a new sound source.

The higher spatial resolution of our system results in more precise information available to make better decisions for optimum acoustic microphone pickup and postprocessing data analysis.

Ready for the future

So, yes, thousands of virtual microphones do matter — they change how we understand acoustic spaces. In uncertain times, where physical distancing and other COVID requirements demand more of our spaces than before, that’s more important than ever.

View full post