Tuesday saw the introduction by Facebook’s parent company Meta Platforms of an AI model that can translate and transcribe speech in numerous languages, serving as a possible cornerstone for products that enable real-time communication across language barriers.
The business claimed in a blog post that their SeamlessM4T model could combine technology that was previously only accessible in separate models to provide translations between text and speech in roughly 100 languages as well as complete speech-to-speech translation for 35 languages.
According to CEO Mark Zuckerberg, these technologies will help users from all over the world communicate in the metaverse, a collection of interconnected virtual worlds, on which he is pinning the company’s future.
According to the blog post, Meta is making the model available to the general public for non-commercial use.
This year, the largest social networking platform in the world published a flurry of primarily free AI models, including a sizable language model dubbed Llama that directly competes with the proprietary models supplied by Google and OpenAI, which is sponsored by Microsoft.
Zuckerberg claims that Meta benefits from an open AI ecosystem because it stands to gain more from essentially crowdsourcing the development of user-facing tools for its social platforms rather than by charging for access to the models.
However, Meta is confronted with the same legal issues as the rest of the sector over the training data used to build its models.
In a complaint filed in July against Meta and OpenAI, comedian Sarah Silverman and two other authors claimed that the companies had used their works as training data without their consent.
In a research publication, Meta researchers claimed that they had collected audio training data for the SeamlessM4T model using 4 million hours of “raw audio originating from a publicly available repository of crawled web data,” without identifying the repository in question.
On queries on the source of the audio data, a Meta spokesman did not provide a response.
According to the research article, text data was derived from datasets established last year that included content retrieved from Wikipedia and related websites.