Microsoft releases new AI models to expand further beyond OpenAI

Microsoft announced MAI-Transcribe-1, a new speech-to-text model, and made its in-house MAI-Voice-1 and MAI-Image-2 models broadly available to developers for commercial use for the first time, expanding its proprietary AI capabilities beyond its OpenAI partnership. Read More

Microsoft releases new AI models to expand further beyond OpenAI
Mustafa Suleyman, CEO of Microsoft AI. (GeekWire File Photo / Kevin Lisota)

Microsoft is expanding its roster of in-house AI models, releasing a new speech-to-text system and making two existing models broadly available to developers for the first time.

The moves by Microsoft AI (MAI) are part of a broader effort by the company to expand its proprietary AI capabilities beyond its partnership with OpenAI, giving Microsoft more control over its own destiny in the competition against Google, Amazon, and others.

Microsoft announced MAI-Transcribe-1 on Thursday, a speech-to-text model that it says is the most accurate currently available. The company also released its existing voice and image generation models, known as MAI-Voice-1 and MAI-Image-2, for broad commercial use.

It’s Microsoft’s first major model release since a March reorganization, announced by CEO Satya Nadella, in which Microsoft AI CEO Mustafa Suleyman shifted away from day-to-day Copilot oversight to focus on frontier model development and superintelligence.

Suleyman told The Verge that the transcription model runs at “half the GPU cost of the other state-of-the-art models.” He told VentureBeat that the model was built by a team of just 10 people, and that Microsoft plans to eventually build a frontier large language model to be “completely independent” if needed.

Microsoft also recently hired former Allen Institute for CEO Ali Farhadi and other top AI researchers from the Seattle-based institute to further bolster Suleyman’s team, as GeekWire reported last week.

MAI-Transcribe-1 is designed to handle noisy real-world conditions such as call centers and conference rooms, and Microsoft says it is testing integrations with Copilot and Teams. Microsoft says it offers the best price-performance of any large cloud provider, competing directly with OpenAI’s Whisper and Google’s Gemini on the FLEURS benchmark.

In a blog post, Suleyman called the model “not just the most accurate but also lightning fast.”

MAI-Voice-1 generates natural-sounding speech and now lets developers create custom voices from short snippets of sample audio. MAI-Image-2 ranks in the top three on the Arena.ai image generation leaderboard and is rolling out in Bing and PowerPoint.

All three are available on the Microsoft Foundry developer AI platform and MAI Playground.

Share

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0