Many of us have had to learn how to change the way we speak when in the presence of an Alexa device. In order to make certain it (she?) understands me, I find I must change the way I speak when addressing Alexa, especially the manner in which I frame a question. Voice-interfaces with computers and artificial intelligence are becoming ubiquitous, and speaking aloud to our computers will come to replace the keyboard as the way we will query the internet. One implication of this will be that we will need to learn a particular kind of vocal “etiquette” for speaking to such machines. Speech classes will likely become a regular feature of schooling, but speech in this case will mean the formalized way we will communicate with our devices. Students will learn to develop a formal, public voice when speaking aloud to the computer.
The increasing use of voice interfaces with computers signals a significant cultural shift: from text to speech. Transcription tools are readily accessible today: I use an application called Otter.ai for reasonably good audio transcriptions. Although I am currently typing these words, we may find that more and more “writers” will be those who speak their words aloud to a computer. In ancient Rome, those we think of as writers were in many cases speaking their thoughts and having these written down by an amanuensis, an assistant who took dictation.
In the speech-centric culture that is emerging, a writer would speak to a “digital scribe.” What will it mean to learn how to “write” in such a speech-centric culture? Does this mean that the arts of rhetoric will make a comeback? “To write” will mean “to dictate,” which suggests that diction classes would make a return to the curriculum.
The literary theorist Walter Ong once referred to this cultural condition as “secondary orality.” Primary oral cultures were those that did not possess any knowledge of writing at all; such cultures featured not the writer, but the oral bard. Think of the archaic Greek culture that produced the Homeric epics, which were only some time latter written down as Greek culture became literate. Secondary orality is the condition where the technology of writing exists, but most communication is oral. Ong envisioned a television-saturated culture as his model of such electronic orality. What I am describing here as secondary orality is a voice-aural culture.
As evidence of such a growing voice-aural culture, I note that the voice is quickly becoming the biometric security interface of choice. Rather than passwords or facial recognition—or the now-archaic sequences of alphanumeric symbols—our unique voices provide the secure password. Dan Hansen crows that “Voice biometrics…is both a security game-changer and a customer-service home run. The technology recognizes a customer’s unique vocal patterns and grants him or her secure access immediately.”
Alibaba Cloud similarly raves about the security advantages of voiceprint.
“Voiceprint refers to the acoustic frequency spectrum that carries the speech information in a human voice. Like fingerprints, it has unique biometric signatures, is individual-specific, and can function as an identification method. The acoustical signal is a unidimensional continuous signal. On discretization, you will get the acoustical signal that can be processed by conventional computers.”Voiceprint Recognition System — Not Just a Powerful Authentication Tool
No more memorizing different passwords: simply use your voice and voila! Instant access.
“The Chinese saying of ‘someone may not yet be here bodily, but you can already hear him/her speaking’ in real life vividly describes a scene where you identify another person by the voice.”Voiceprint Recognition System — Not Just a Powerful Authentication Tool
That folksy saying disguises a more problematic effect of the increasing ubiquity of voiceprint.
One speech recognition expert has already raised alarms about the increasing presence of voiceprint technology in the workplace. Martyn Farrows warns that companies who employ voice-technologies could be unwittingly collecting voice data, “and there are ethics and GDPR [General Data Protection Regulation] issues to be considered,” he says. Because Europe has the GDPR in place, one wonders if such speech eavesdropping will be as much of a concern. But in the U.S. and China, with much less stringent protections, the concerns raised by Farrows are real.
“What are the companies who collect our voice data doing with it in addition to building better voice engines?” Farrows asks. “Are they selling it, using it to profile us in some way, using it to refine an advertising-based business model that targets us as individuals? The lack of transparency around big tech’s use of voice data is a concern.”
Indeed, part of our speech education may very well involve learning to watch what we say in public. As you might have guessed, these kinds of technologies are also being used for digital surveillance. We know that many governments are using facial recognition as a way to identify protesters or to otherwise keep track of people. We also know that voiceprint is already being used for intelligence purposes, to identify who governments believe to be potential terrorists, for example. The U.S. Bureau of Prisons is reported to be collecting prisoners’ voiceprints. It is not too much of a stretch of the imagination to see governments using voiceprint technology to monitor citizens’ activities.
IBM recently announced that they will no longer research or make available facial recognition technology. And in the wake of the worldwide George Floyd protests, Amazon announced that it pausing police use of its facial recognition software for one year. This might comfort civil libertarians, but I wonder if one form of intrusive digital surveillance will simply and quietly be replaced by an even more invasive technology?
As the Hong King protesters demonstrated, wearing masks or other kinds of elaborate face coverings can fool facial recognition technologies. Would protesters be able to disguise their voices sufficiently to fool voiceprint technologies? If governments begin routinely monitoring our speech patterns, they would be policing more than just the content of our speech, but also the performance of that speech, how our speech sounds are formed. The future of public speech may lie not in the suppression of the content of such speech, but in the self-censorship of the manner in which we speak.
What is the vocal equivalent of a nom de plume? Who will be the vocal Publius?