Microsoft unveiled a new sort of AI covertly (it may quietly petrify you)

To be one of the last intact humans is an incredible honour. I’m well aware that in the not-too-distant future, the artists once known as humans will be an endearing mix of flesh and chips.

Perhaps I shouldn’t have been shocked, though, when Microsoft’s researchers came along to slightly speed the bleak future.

It all appeared so innocent and so very science-y. The paper’s title, “Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers,” was intentionally vague.

I’m curious as to what you think this may represent. Is there a new, more efficient method for a computer to transcribe human speech? The abstract from the study starts out rather innocuously. Numerous terms, expressions, and acronyms are used that are foreign to, say, many lay human language models. The article says that VALL-E is the name of the language model used by the neural codec.

Surely, a name like this would make you more amiable. What could be so frightening about a system that sounds like the lovable robot from the movie?

Well, this perhaps: “VALL-E develops in-context learning capabilities and may be used to synthesis high-quality individualised speech using just a 3-second enrolled recording of an unseen speaker as an acoustic trigger.”

I have always wished for the development of cognitive capacities. I can just sit and wait for them to come out at this point.

And the chilling conclusion that might be drawn from the researchers’ last phrase. Microsoft’s brilliant minds simply need to hear you speak for three seconds in order to create phoney extended words and maybe even lengthy speeches that sound remarkably like you.

Neither of us would profit from me getting too bogged down in the technical details, so I won’t.

I’ll just say that Meta, one of the most trusted and respected firms in the world, created the audio library used by VALL-E. It’s called LibriLight, and it has the voices of 7,000 individuals speaking for 60,000 hours.

Obviously, I checked out what VALL-E had to offer musically.

A man’s voice was audible for all of three seconds. Then I listened to the 8 seconds of his VALL-E version that had been prompted to say: “They went afterwards warily around the hut fumbling before and about them to locate anything to prove that Warrenton had achieved his task.”

Honestly, I doubt you’ll see much of a change.

Many of the prompts did, in fact, read like poorly written excerpts from the 18th century. Here’s a snippet about how “this humanitarian and right-minded father comforted his sorrowful daughter,” and “her mother, hugging her again, did everything she could to alleviate her sorrows.”

But beyond from listening to the academics offer additional cases, what else could I do? Versions of VALL-E varied in how suspicious they seemed. The word choice seemed off. A splice had been made in their consciousness.

However, the overall impression is appropriately eerie.

Of course you should have known better already. You are aware that it is best to remain silent when fraudsters call, since they may record your conversation and use the audio to place fraudulent orders in your name.

However, this seems to be on a whole other level of complexity. Maybe I’ve seen too much of Peacock’s “The Capture,” where deepfakes are treated like they belong in the government. Microsoft seems like such a wonderful, harmless firm these days that maybe I shouldn’t worry.

The thought that someone, anybody, may be duped into thinking I said something I didn’t and never would isn’t exactly a source of solace for me. Especially considering the study authors’ assertion that they can accurately recreate the “emotion and acoustic environment” of the first three seconds of a speaker’s voice.

You’ll be glad to know that scientists seem to have anticipated this inconvenience. “Since VALL-E might synthesis speech that retains speaker identity, it may pose potential dangers in abuse of the model, such as spoofing voice identification or impersonating a particular speaker,” they say.

Which brings us to the question: what is the answer? The researchers recommend constructing a detecting system.

Which may cause others to ask, “Why did you do this then?”

Why do technological advancements occur so frequently? “Because we could,” of course.