Skip to main content

OpenAI debuts Whisper API for speech-to-text transcription and translation

 


  To coincide with the rollout of the ChatGPT API, OpenAI today launched the Whisper API, a hosted version of the open source Whisper speech-to-text model that the company released in September.


Priced at $0.006 per minute, Whisper is an automatic speech recognition system that OpenAI claims enables “robust” transcription in multiple languages as well as translation from those languages into English. It takes files in a variety of formats, including M4A, MP3, MP4, MPEG, MPGA, WAV and WEBM.


Countless organizations have developed highly capable speech recognition systems, which sit at the core of software and services from tech giants like Google, Amazon and Meta. But what makes Whisper different is that it was trained on 680,000 hours of multilingual and “multitask” data collected from the web, according to OpenAI president and chairman Greg Brockman, which lead to improved recognition of unique accents, background noise and technical jargon.


“We released a model, but that actually was not enough to cause the whole developer ecosystem to build around it,” Brockman said in a video call with TechCrunch yesterday afternoon. “The Whisper API is the same large model that you can get open source, but we’ve optimized to the extreme. It’s much, much faster and extremely convenient.”


To Brockman’s point, there’s plenty in the way of barriers when it comes to enterprises adopting voice transcription technology. According to a 2020 Statista survey, companies cite accuracy, accent- or dialect-related recognition issues and cost as the top reasons they haven’t embraced tech like tech-to-speech.


Also read: BetterHelp owes customers $7.8M after FTC alleges data mishandling

Whisper has its limitations, though — particularly in the area of “next-word” prediction. Because the system was trained on a large amount of noisy data, OpenAI cautions that Whisper might include words in its transcriptions that weren’t actually spoken — possibly because it’s both trying to predict the next word in audio and transcribe the audio recording itself. Moreover, Whisper doesn’t perform equally well across languages, suffering from a higher error rate when it comes to speakers of languages that aren’t well-represented in the training data.


That last bit is nothing new to the world of speech recognition, unfortunately. Biases have long plagued even the best systems, with a 2020 Stanford study finding systems from Amazon, Apple, Google, IBM and Microsoft made far fewer errors — about 19% — with users who are white than with users who are Black.

Despite this, OpenAI sees Whisper’s transcription capabilities being used to improve existing apps, services, products and tools. Already, AI-powered language learning app Speak is using the Whisper API to power a new in-app virtual speaking companion.


If OpenAI can break into the speech-to-text market in a major way, it could be quite profitable for the Microsoft-backed company. According to one report, the segment could be worth $5.4 billion by 2026, up from $2.2 billion in 2021.


“Our picture is that we really want to be this universal intelligence,” Brockman said. “We really want to, very flexibly, be able to take in whatever kind of data you have — whatever kind of task you want to accomplish — and be a force multiplier on that attention.”

Comments

Popular posts from this blog

Spotify Tests New Feature: Automatic Offline Mix Playlist Download

Spotify Tests New Feature: Automatic Offline Mix Playlist Download Music streaming giant Spotify is making progress on a highly anticipated feature as it officially begins testing a playlist that automatically downloads users' recent favorites. After a couple of years of development, the company's CEO, Daniel Ek, confirmed the testing of "Your Offline Mix" via a tweet. While specific details about the release date remain undisclosed, this exciting addition to Spotify's repertoire has already started appearing for some users. Music streaming company Spotify is finally and officially testing a playlist that automatically downloads some of your recent favorites a couple of years after starting working on such a feature. In a tweet, the company’s CEO Daniel Ek said that Spotify has been testing “Your Offline Mix.” But beyond that, he didn’t give out any details. In particular, it’s unclear when the company plans to release the feature. As the screenshot posted by Ek s...

Notorious game cracker has removed Denuvo from Hogwarts Legacy after just two weeks

    D enuvo is an anti-tamper technology and DRM solution often adopted by large game publishers (and sometimes by indie studios) to protect their latest titles against piracy. There's someone, however, who has apparently made cracking this powerful DRM their life's mission. Game cracker "Empress" has once again achieved the impossible. After a brief beta period managed through her Telegram channel, the notorious hacker has released a "cracked" version of Hogwarts Legacy. The new action RPG set in Harry Potter's wizarding world was released just a couple of weeks ago, and it likely won't suffer much as publisher Warner Bros. has already sold more than 12 million copies (on PC, PS5 and Xbox). Empress' ability to crack Denuvo on the latest DRM-protected games is a remarkable achievement. The cracker explains that Hogwarts Legacy is protected by "Denuvo v17" plus Steam's own (mild) DRM. The "NFO" file accompanying the cracked...

Addressing criticism, OpenAI will no longer use customer data to train its models by default

    A s the ChatGPT and Whisper APIs launch this morning, OpenAI is changing the terms of its API developer policy, aiming to address developer — and user — criticism. Starting today, OpenAI says that it won’t use any data submitted through its API for “service improvements,” including AI model training, unless a customer or organization opts in. In addition, the company is implementing a 30-day data retention policy for API users with options for stricter retention “depending on user needs,” and simplifying its terms and data ownership to make it clear that users own the input and output of the models. Greg Brockman, the president and chairman of OpenAI, asserts that some of these changes aren’t changes necessarily — it’s always been the case that OpenAI API users own input and output data, whether text, images or otherwise. But the emerging legal challenges around generative AI and customer feedback prompted a rewriting of the terms of service, he says. “One of our biggest f...