Inside Zapr: The team that breaks down sound to numbers

Written by Zapr Media Labs | Apr 26, 2017 5:55:11 AM

Using homegrown, cutting edge Digital Signal processing (DSP) Zapr Media Labs is pioneering large scale TV-to-Mobile analytics and user profiling in India.

Zapr’s technology breaks down ambient sound into numbers, without recording any audio which is at the core of our Signal processing. The current three-member team talks about the intractable DSP problems they face in building something that has never been done before - tracking consumption for not just television, but all media through passive audio fingerprinting.

Q: What is the role of DSP at Zapr Media Labs?

Srikanth: Signal processing is the core of Zapr technology. We try to find unique signature (fingerprint) in a snippet of audio that is robust enough to withstand environment noise, physical recording locations and audio quality degradation in low end phones - all of this by retaining only about 0.05% of the data to protect user privacy. We send these fingerprints to our servers and match them with 600 channels live. In the process we find out user consumption behaviour over a period of time - like whether they prefer The Kapil Sharma Show over The Big Bang Theory.

Q: What is the basic workflow from fingerprinting to matching data in the backend?

Hari: We understand audio as certain numbers which occur at a certain time - using this sequence fingerprints (queries) are generated in the phones itself and sent to match with data on our servers.

Srikanth: The immediate concern was to increase match percentage and decrease false positives (when something that should not match with a particular channel has incorrectly matched). There was no predefined logic or algorithm that we could use so we started approaching the problem in different angles.

Q: What is the idea behind building a song matcher app?

Srikanth: Our goal is to scale up our channel matching system and make it content agnostic - to match queries not just with 600 channels but large amounts of audio data from Youtube, video on demand, radio and other audio content which come up to millions of hours of data.

One of the use case of the generalised solution is to use it for matching music/songs.

Q: What are the challenges you face with the song matching project?

Srikanth: We started by matching tens of thousands hours worth data of songs and created new layers of algorithm to reduce search space. Our biggest problem was finding infrastructure for such large amounts of data. Sometimes a single experiment would take up 1TB of memory. So we had to look for cloud services with large storage capacities. Although the app gave us accuracy rates above 90%, we’re still working on making it close to 100% like our channel matching.

Srikanth: Currently, we have huge amounts of references on the server side let’s say 20,000 hours of data which is equivalent to almost 3.5 lakh songs; the entire film industry of India since it started! If we search the entire 20,000 hours it will take 7 - 8 hours to match a single query. So we came up with similar looking subsets of the query where we don’t look for the exact match but systematically reduce the search space.

Q: How is R&D at Zapr Media Labs contributing to the larger field of DSP?

Dhritiman: DSP is a very large field in itself. We’re working on very specialised problems with audio signals that have never been attempted at such large scale. So when we search for similar literature or patents, we find very few which are applicable to our problems, but absolutely nothing that fits the scale of what we do here at Zapr.

Hari: Moreover, we can’t strictly define any of our problems like most DSP projects. We can’t permanently distinguish good source from noise. For channel matching, the TV in the background is more important than a voice in the foreground. Whereas for the song matcher app, the song in the foreground is good source and the TV in the background becomes noise. Each of these problems can be considered as individual or even multiple PhD studies.

Srikanth: Major companies have ventured into audio fingerprinting, but they are still looking at it in the perspective of video - the audio buckets are divided based on which bucket the video falls under. But even they don’t handle a scale of million hours of data like we do. Once we get our patents sorted, we can definitely go ahead and publish papers which will be very new to the DSP community.

Q: What are the other areas of Zapr tech in which the DSP team plays a crucial role?

Dhritiman: DSP plays an important role in code integration where we integrate our code bases on both the Mobile Side (SDK) and on the server side. We provide a library for integration and and expose certain APIs. These APIs need to support a multi-threading environment. The challenging part is when an error occurs, it is difficult to debug because of the long pipeline - the fingerprinting, transmission from mobiles and reception on the server end. We have to ensure that the entire thing passes through all the stages accurately.

Q: Given the scarcity of learning resources and scale of data, how do you work your way around problems?

Hari: Sometimes we may get an idea just by looking at the patterns. It could even be total guess work - trial and error based on intuition, because a small change in parameter might do wonders!

Dhritiman: The key is to experiment a lot - at times the solution lies in a peculiar idea which may not work theoretically (laughs) but gives fabulous results in an experiment. The vice versa is also true sometimes - something that should work theoretically does not work in our scenarios.

Q: What kind of support do you receive from the founders to take on these challenges?

Srikanth: Our problems are such that something which should take 15 days to solve might have infrastructure issues where copying the data itself takes a whole week. And dealing with them requires immense support from those heading the company. Our CTO Sajo Mathews is someone who truly understands the scale of our problems, so he gives us the time and technical support to work them out.

Hari: Maybe inside he is frustrated sometimes, which happens naturally when all experiments don’t give good results, but it never shows; Sajo tries to understand and accepts when things fail or succeed.

Q: And finally, what’s it like working together in the DSP team here at Zapr?

Srikanth: We are a small, well gelled team. The three of us do everything from coding to testing, integration and maintaining the project. We don’t push work around and we know fairly well who handles what. This kind of setup is great for R&D since we freely interact with each other, and with other teams which is important for building cohesive technology.

Dhritiman: This is my second job and it’s been a really good experience for me. Early in your career you definitely don’t want a job where you just tweak a piece of code here and there. You want something that helps you learn and I’ve been learning a lot from these two guys and from the team at Zapr.

"I think the single biggest motivating factor for anyone is to work on hard, perhaps seemingly impossible problems, and then to see it happen. The DSP team exemplifies this aspect of working at Zapr the most."

- Sajo Mathews, CTO at Zapr Media Labs

Read how Zapr’s fingerprinting algorithm got rated best in class globally at MIREX - audio content recognition contest.

Zapr is hiring!

If you're a researcher or student who is interested in the fields of Digital Signal Processing, Audio Content Recognition, Automatic Speech Recognition, Speech Synthesis, Music Information Retrieval and Computer Vision, do send your resume to ps@zapr.in.

Srikanth Konjeti (left) - Ex-Harman International (India), heads the DSP team at Zapr with more than 13 years of experience in the field. He lives in Bangalore with his family and is busy solving interestingly frustrating DSP problems.

Harikrishnan Potty N (right) - Ex-Samsung R&D Institute, also lives in Bangalore with his wife and kid. A graduate from IIT Bombay, he’s had more than 8 years of experience in audio signal processing and graphics domain.

Dhritiman Kashyap (centre) graduated from IIT Delhi with a Masters in Digital Signal Processing. He is two years into his career and loves his job at Zapr Media Labs.

View full post