OUR TECHNOLOGY

Hearing what matters most

The human auditory system plays a pivotal role in our ability to perceive the direction of sound. It also grants us the capacity to selectively filter out certain sounds while focusing on others, a phenomenon commonly known as the "cocktail party effect." These inherent capabilities have historically served as valuable sources of information and vital survival mechanisms for both humans and animals.

In the realm of digital innovation, Squarehead has successfully replicated and enhanced some aspects of this auditory capability, harnessing its potential to provide significant insights and intelligence in sectors where sound plays a crucial role in decision-making. By combining numerous “ears” (microphones), we have achieved digital and automatized sound source localization. Furthermore, an integrated camera within the microphone array provides a visual reference. This proprietary technology empowers us to visualize sound in space, serving as an “acoustic camera”.

Simultaneously, we have embarked on the digitalization of the human experience and memory. Drawing from the wealth of knowledge acquired throughout our lifetimes, humans possess the remarkable ability to discern whether a sound is favorable or unfavorable, dangerous or benign. Leveraging advanced machine learning techniques applied to spatial sound, Squarehead enhances both the intelligibility and actionable nature of audio information.

What is beamforming?

In order to accurately perceive the direction of an incoming sound beyond its frequency content and loudness, a minimum of two distinct observation points are required. This necessitates the placement of at least two microphones in different positions. While a single microphone alone is generally omni-directional, the combination of two microphones grants us the ability to discern the source of the sound.

An array consists of multiple microphones strategically positioned according to design criteria aimed at achieving the desired level of directivity. Typically, larger physical arrays exhibit greater directivity, resulting in enhanced precision in pinpointing and amplifying sound.

A microphone array, processing audio, uses an algorithm known as beamforming, and acts as a spatial filter. By feeding the beamformer with the spatial position of the desired sound source, the sound passes through the filter unaltered, while surrounding noise is attenuated. This yields a higher signal-to-interference ratio at the microphone array's output and creates the impression of zooming in on the sound of interest.

Arrays are also employed for locating sound sources in both space and time. The array functions as a multi-channel system, even at the output stage. This is made possible by digital beamforming, which allows the same array to be focused in multiple spatial directions simultaneously. This process involves forming beams in various directions, hence the term "beamforming." Employing beamforming for multiple directions within the array's coverage area facilitates sound source tracking, creation of acoustical images, remote monitoring of multiple acoustic scenes, determination of specific sound event locations, and classification of sound events. As for filtering in general, there exists a vast pool of beamforming algorithms to choose from, with the selection typically driven by specific application requirements. These requirements may include the desired level of directivity, minimum pick-up range, signal-to-noise ratio at the system's output, and so forth.

EXPLANATION FOR TECH-PEOPLE

OK hear we go… (pun intended). You know when you hear a sound, your body may react to that sound. For instance, when a car honks, you usually turn towards the sound to see what that was all about. The reason you know where the sound comes from, and turn toward it, is because you have two ears. These work as two microphones and are able to tell you the direction of what you hear.

So what we do is that we take a lot of microphones, we place them in a really trippy pattern, and we call them arrays. The more microphones you have in this so-called array, the more precise and clear sound you get. In addition, you can aim at what you want to listen to, and turn up that sound.

So this array records sound from all its microphones, and mixes the sound together by using an algorithm. This process is called (drumroll) beamforming. This means that all the microphones have been combined to one supermicrophone. The supermicrophone gives us superhearing, which means that when we listen in a specific direction with it, the sound from that direction becomes super clear. This happens because all the surrounding sounds are super muted.

Now over to space and time (calm down, no one understood Interstellar). Space = a room. Time= … wait, did anyone ever figure this out? Anyway, the point is that we also want to see the sound. And we also want to gather sound from different directions at the same time. And find the position of the sound. And track the sound. And you are probably wondering “how can I do all this at once”? You don’t, it’s all digital, aka you don’t have to do anything. Except maybe turn it on, but after that it does everything for you. At the same time. You don’t even have to be in the same room. AND you can choose different algorithms based on whether you want to find the position of the sound, what the sound really is (is it a bird, is it a plane, etc.), or something else. It's honestly really spectacular and one-of-a-kind. “And boy, have we patented it”.

EXPLANATION FOR LAYPEOPLE

Try it yourself

Try our array simulator, illustrating the functionality and benefits of beamforming. Swap between a regular microphone and a microphone array, and drag the beam around to hear the difference.

Machine learning

Once beamforming is completed (finally), effectively filtering out noise and interference in the spatial domain, the remaining sound can be subjected to further analysis and processing. Similar to the human auditory system, determining the direction of sound merely represents a partial aspect of the overall situational awareness process. Gaining comprehension of the meaning behind each sound and its origin, necessitates prior familiarity with that particular sound, a process referred to as training within the realm of machine learning (here we go again…).

The field of machine learning has made remarkable progress in recent years, yielding impressive outcomes. Classification of sound is done by extracting meaningful features from microphone array data and training a model to recognize patterns within those features.

The journey from raw data to a comprehensively defined audio class varies depending on the choice of algorithm. Broadly speaking, the common steps encompass data collection, data annotation, preprocessing of data (including  data cleaning and augmentation), feature calculation and extraction, model training, model evaluation, and inference facilitating the model's generalization to previously unseen sound samples.

The specific techniques and algorithms employed in sound classification can vary depending on the inherent characteristics of the problem and the availability of data. The domain of sound classification is vast, continuously inspiring researchers to explore novel methodologies aimed at enhancing accuracy and efficiency.

The application of machine learning to microphone array data necessitates meticulous data collection employing microphone arrays. The path towards a well-curated and labeled dataset demands a high level of expertise and dedication, along with access to a diverse range of samples.

Squarehead has made substantial investments in machine learning and has built a proficient team of ML experts. In a short span of time, significant strides have been achieved in equipping our acoustic sensors with intelligence, empowering them to not only augment and localize, but also classify a wide spectrum of audio events of paramount importance to our clientele.

APPLICATION AREAS