Blog

OpenAI teases a new ‘Voice Engine’ audio tool capable of replicating human voices using just 15 seconds of audio.

The Voice Engine has the ability to generate speech that mimics specific individuals, capturing their unique cadence and intonations.

OpenAI has unveiled early findings from a trial of a feature capable of convincingly articulating text in a human-like voice, marking a significant advancement in artificial intelligence and prompting concerns about potential deepfake implications. The company has shared preliminary demonstrations and applications from a limited-scale preview of its text-to-speech model, named Voice Engine, with approximately 10 developers to date, as per a spokesperson. Following input from stakeholders including policymakers, industry specialists, educators, and artists, OpenAI opted to restrict the feature’s release, despite initially intending to offer it to up to 100 developers via an application process. “We acknowledge the significant risks associated with generating speech that closely resembles individuals’ voices, particularly in an election year,” the company stated in a blog post on Friday. “We are collaborating with partners globally across various sectors to integrate their insights into our development process.”

Previously, AI technology has been utilized to fabricate voices in certain instances. In January, a fabricated yet convincingly realistic phone call allegedly from President Joe Biden advised individuals in New Hampshire against voting in the primaries — an incident that heightened concerns about AI manipulation ahead of crucial global elections.

In contrast to OpenAI’s prior endeavors in generating audio content, Voice Engine has the capability to produce speech resembling specific individuals, complete with their unique rhythm and intonation. The software only requires a mere 15 seconds of recorded audio of the person speaking to replicate their voice.

During a demonstration of the tool, Bloomberg listened to a segment where OpenAI CEO Sam Altman briefly elaborated on the technology using a voice that sounded strikingly similar to his natural speech, despite being entirely AI-generated.


Jeff Harris, a product lead at OpenAI, remarked, “With the appropriate audio configuration, it essentially mimics a human-caliber voice.” He added, “The technical quality is quite remarkable.” Nonetheless, Harris emphasized, “There are evident safety considerations surrounding the capability to precisely replicate human speech.”

One of OpenAI’s current developer partners, the Norman Prince Neurosciences Institute at Lifespan, a not-for-profit health system, is utilizing the tool to aid patients in recovering their voice. For instance, the tool was employed to restore the speech of a young patient who had difficulty speaking clearly due to a brain tumor. By replicating her previous speech from a recording for a school project, the tool helped restore her voice, as mentioned in a company blog post.

Additionally, OpenAI’s customized speech model can translate the generated audio into various languages, offering utility to companies in the audio industry like Spotify Technology SA. Spotify has already utilized the technology in a pilot program to translate podcasts of renowned hosts such as Lex Fridman. OpenAI has also highlighted other beneficial applications of the technology, including creating diverse voices for educational content aimed at children.

In its testing program, OpenAI mandates that its partners adhere to its usage policies, obtain consent from the original speaker before utilizing their voice, and disclose to listeners that the voices they hear are AI-generated. Additionally, the company is implementing an inaudible audio watermark to enable identification of audio created by its tool.

Before making a decision on whether to expand the feature’s release, OpenAI is seeking feedback from external experts. “It’s crucial for people worldwide to comprehend the trajectory of this technology, regardless of whether we ultimately decide to widely deploy it ourselves,” the company stated in the blog post.

OpenAI expressed its aspiration that the software preview serves as a catalyst for strengthening societal resilience against the emerging challenges posed by increasingly sophisticated AI technologies. For instance, the company urged banks to discontinue the use of voice authentication for accessing bank accounts and sensitive data. Furthermore, OpenAI advocates for public awareness regarding deceptive AI content and the advancement of methods for distinguishing between real and AI-generated audio content.

Leave a Reply

Your email address will not be published. Required fields are marked *

Verified by MonsterInsights