close
close

We are still a long way from truly open source AI

We are still a long way from truly open source AI

Unlock Editor’s Digest for free

Open-source artificial intelligence has been one of the most surprising tech stories of the past year. While companies like OpenAI and Google have poured billions of dollars into building ever more powerful AI, “open” models that are freely available for developers to use and adapt have closed the performance gap.

There is only one drawback: most of these open source systems are not very open. Critics accuse their supporters of “open washing” – they try to profit from the halo effect of open source, which frees them from the restrictions of normal commercial software products without living up to their name.

Efforts to create a truly open source version of AI are finally gaining momentum. But there is no guarantee that it will be able to keep up with the development of open source software, which has played a crucial role in the technology world over the past 20 years. With traditional open source software, such as the Linux operating system, the code is freely available for developers to view, use and adapt. So-called open source AI is a very different story, not least because most modern AI systems learn from data rather than having their logic programmed into code.

Take Meta’s Llama, for example. Only the “weights” that determine how the model responds to queries are exposed. Users can take it and customize it, but they can’t see the underlying data it was trained on and don’t have enough information to reproduce the model from scratch.

For many developers, however, this has some clear advantages: They can adapt and train quasi-open models using their own information without having to pass on sensitive internal data to another company.

But the lack of openness comes at a price. According to Ayah Bdeir, a senior advisor at the Mozilla Foundation, only truly open source technology would give people a comprehensive understanding of the systems that are gradually affecting every facet of our lives, while ensuring that innovation and competition cannot be stifled by a handful of dominant AI companies.

One answer came from the Open Source Initiative, which developed the definition of open source software over 20 years ago. This week it released a near-final definition that could help shape the evolution of the field.

This would require not only the weights for a model to be released, but also enough information about the data it was trained on so that someone else could reproduce it, as well as all the code behind the system. Other groups such as Mozilla and the Linux Foundation are pushing similar initiatives.

Such moves are already leading to greater segmentation of the AI ​​world. Many companies are being more cautious with their terminology – perhaps because they are aware that OSI owns the trademark on the term “open source” and could sue to prevent the term from being used for AI models that do not meet their own definition. Mistral, for example, calls its Nemo an “open weights” model.

In addition to partially open systems, fully open source models are increasingly appearing, such as the Olmo Large Language model developed by the Allen Institute for AI. However, it is far from clear that this version will have as much of an impact in the AI ​​world as it has in traditional software. This would require two things.

First, the technology must meet a large enough need to attract a critical mass of users and developers. In traditional software, the Linux server operating system was a clear alternative to Microsoft Windows, giving it a large user base and strong support from Microsoft’s competitors, including IBM and Oracle. In the AI ​​world, Linux has no equivalent. The market is already more fragmented, and many users will find quasi-open LLMs like Llama sufficient.

Supporters of open-source AI also need to better justify its security. The prospect of such a powerful, universally applicable technology being released to anyone justifiably raises widespread concerns.

Oren Etzioni, former director of the Allen Institute, says many fears are overblown. When it comes to researching online how to build a bomb or a bioweapon, he says, “You can’t really get more out of these (AI models) than you can get out of Google. There’s plenty of them out there – it’s just packaged differently.” He acknowledges there are some areas where making AI more freely available could cause harm, such as automating the creation of more online disinformation.

“Closed” AI also carries risks. But until the additional risk of open source technology and the potential benefits are more thoroughly examined, the fears will remain.

[email protected]

Leave a Reply

Your email address will not be published. Required fields are marked *