Tracking humans through their voice

 

The world is changing blindingly fast. An estimated 700 centuries worth of speech is spoken over cellphones alone each day. An equally staggering volume is speech is transmitted over internet channels, captured in videos, and spoken over radio or short-range communication devices. Never before has human speech been captured or encountered by the world in aggregate in such abundance. It is time to worry about the consequences of this. Good things have happened, of course. The world is better connected and more co-ordinated. But bad things happen too. More crimes are being committed through voice. Moreover, when such crimes are committed, they have tremendous outreach in the world and therefore have tremendous impact.

Our work on voice forensics is focussed on tracking humans through their voice. It has the broad scope of starting from large-scale, machine-aided scientific enquiry into the fundamental nature of human speech and using it to computationalize the process of deducing a person's biophysical and environmental parameters. The human voice carries a tremendous amount of information about the speaker. It is also a potent biomarker and biometric. It is as unique to a person as their fingerprint or DNA. The chances that two people's voices are exactly the same come out to be less than one in a quadrillion by conservative estimates. This tells us that every person on earth could potentially not only be identified, but also described through their voice, . With the avilability of cutting-edge technologies that we continue to develop, tracking humans, and their activities through their voice is now becoming a disinct possibility. Part of our work in this area is founded on our tremendous historical success in automated speech processing over the past decades.

The double-edged sword

With the ability to describe a person through their voice comes the need to prevent the generation of such description by people who may capture your voice without your consent. The technologies for this fall under the realm of privacy preserving speech processing. Our group also works on this aspect. You can read more about this work here.