SonixCycle — browsing Foley sounds by meaning

A tool to browse large collections of Foley sounds using perceptual and semantic content. Published at Audio Mostly 2016 — patented as EP3430535.

October 2016 · UMons — numediart Institute Audio ProcessingMachine LearningC/C++CUDA
← All articles

A search problem for sound designers

Sound designers who produce Foley for films, TV and games typically maintain libraries of tens of thousands of recorded sounds — footsteps on gravel, doors slamming, cloth rustling, glass breaking, you name it. The problem is very simple to state: when you need “a door slamming that sounds tense”, how do you find it among 80,000 files?

Traditional solutions rely on tags typed by a human at ingest time. This is slow, inconsistent across a team, and blind to everything that isn’t in the tag vocabulary. This R&D project — conducted at the numediart Institute of UMons in partnership with Dame Blanche SA and CETIC — aimed to complement text search with perceptual and semantic similarity, so the designer can browse the library the way it actually sounds.

The pipeline

The core idea was to index each audio file at the level of its local perceptual content, not just its filename. For that we built a pipeline around the Apache Solr search engine:

SonixCycle pipeline
Each audio file is segmented every 10 ms, and features are extracted per segment: MFCCs, timbre descriptors, temporal envelope, spectral statistics. These features then feed a similarity index in Solr.
SonixCycle perceptual map
The 2D t-SNE projection allows visual browsing of the library, where audio files that sound alike are spatially close.

Publication & patent

The interface and the indexing workflow are protected under patent EP3430535 / WO2017158159. The research paper was published at Audio Mostly 2016:

Going further