During training for my employer’s new integrated workflow management system, I had to view a bunch of pre-recorded videos. These videos navigated through various screens I was likely to encounter, highlighted relevant sections of each screen, and displayed pop-up text to describe what I was seeing and provide context.
The first time I viewed these screens, I was working from home and remoting into my office computer. Because of this connection configuration, I was unable to use the audio and had to quickly read along with the videos before the pop-ups went away.
The next day, I was back in the office, so I thought I would use the audio and let the computer read to me. Why exercise the eyes and collection of rectus muscles when you can sit back and let the dulcet tones of a husky or bass narrator read to you?
Instead of a human, a computer narrator assaulted my ears with accents in the wrong places, pitch inflections at odds with a word’s meaning and placement within a sentence (think Mike Myers in View from the Top after Christina Applegate’s character turned the word “assess” into an obscenity: “You put the wrong emPHAsis on the wrong syLLAble”), and an apathetic intensity that made me think more of Pauly Shore in Biosphere than of James Earl Jones in, well, anything.
The computer read the words correctly, but it blew the performance because it had no consciousness, no soul, no experience from which to draw, no pain to infuse, no happiness to drape across a symphony of words.
Few things are as intimate and as unique as the human voice. It is, in my opinion, the most beautiful, bountiful, and diverse of all instruments. Of all the things technology makes possible, I’ll wager replicating a human voice is one that cannot be achieved, and if it can, if we can somehow devise a measurement system to determine that we’ve achieved algorithmic replication of a human voice, one that would pass a Turing Test, one a real human would trust and in which would find comfort, find love, I say we should not aspire to this. I invoke Dr. Malcolm. The question of should we is as important as can we.
Your voice is the representation of your soul. No matter how close algorithms come to a convincing facsimile, they will only ever approach a limit. They will always lack a soul.
I’m aware that one should not take this to the extreme. One shouldn’t substitute a human for every instance of Alexa in millions of homes around the world. That would be creepy, always having a human listening in on you, although there could be some benefit. A human might fall asleep and miss a few words. Alexa misses nothing.
She chimes in at inappropriate times, such as when you’re trying to say, “Get Alex a hamburger,” and Alexa, circling blue and turquoise light signaling an inopportune entrance into a conversation, says, “Okay, for hamburger, I recommend …” A real buddy would grill me a hamburger, or, better yet, throw a frozen patty in the microwave so I can instantly have my dosage of radiation and beef. A real buddy would say “How good is that?” with no trace of sarcasm.
Alexa is like The Big Bang Theory’s Sheldon Cooper: she gets sarcasm as well as a cat tolerates a bird on the other side of a screen door.
Have you ever received a mild electric shock? Perhaps you’ve plugged in a frayed lamp cord, tried to plug in a good condition cord in the dark and gotten your fingertips past the base and touching the prongs, been jogging while a downed powerline whipsawed its way back and forth on the road before you like a snake on acid, catching your heel and giving you the extra jolt needed to finish strong.
With the possible exception of the last example, I’ve always found electrocution to be unpleasant. It’s a weird sensation. It seems wrong and foreign to my senses, anathema to my sense of what it means to be wholly human.
I feel a similar sensation when I hear an artificially-generated voice.
These phonetic fakes cannot approach me and you.