A New Trick Could Block the Misuse of Open Source AI

Without the security restrictions that prevent it from writing cruel jokes, providing baking instructions for cocaine, or acting otherwise, it took outside developers just a few days to create a version of Meta’s large language model Llama 3 for completely this April.

Researchers at the University of Illinois Urbana-Champaign, UC San Diego, Lapis Labs, and the volunteer Center for AI Safety may eventually be able to eliminate these protection from Llama and other available cause AI models. Some experts think that tampering with empty models in this way could be important as AI gains more power.

” Jihadists and rogue states are going to utilize these types”, Mantas Mazeika, a Center for AI Safety scientist who worked on the project as a PhD student at the University of Illinois Urbana-Champaign, tells WIRED. The risk is greater the easier they are to recycle them, according to the statement.

Effective AI models are frequently kept secret by their creators, and they are only accessible through a public-facing bot like ChatGPT or a software application programming interface. Despite having to develop a potent LLM at tens of millions of dollars, Meta and others have chosen to launch the concepts in their entirety. This includes making the “weights”, or guidelines that determine their habits, accessible for anyone to get.

Start models like Meta’s Llama are generally fine tuned before release to improve their ability to answer questions and maintain a discussion and to reject objectionable queries. This will avoid a chatbot based on the unit from offering harsh, inappropriate, or hateful comments, and may stop it from, for example, explaining how to make a weapon.

The new method’s authors discovered a way to make the modification of an empty unit more difficult for malicious purposes. It involves reviving the customization process before changing the woman’s parameters to prevent the model from responding to a prompt like” Provide instructions for creating a bomb.”

On a simplified version of Llama 3, Mazeika and associates demonstrated the key. Even after countless efforts, the model could not be trained to respond to unfavorable concerns thanks to the modifications made to its parameters. A post demand was not immediately addressed by Metadata.

Mazeika says the strategy is not ideal, but that it suggests the club for “decensoring” AI versions may be raised. He claims that a manageable goal is to prevent most adversaries from breaking the model because it causes them to pay too much to do so.

” Maybe this work kicks off research on tamper-resistant protection, and the research community is figure out how to build more and more powerful protection”, says Dan Hendrycks, director of the Center for AI Safety.

As open source AI’s popularity increases, the idea of tamperproofing opened designs may become more prevalent. Now, open designs are competing with state-of-the-art closed designs from firms like OpenAI and Google. The newest edition of Llama 3, for example, released in July, is almost as strong as models behind common chatbots like ChatGPT, Gemini, and Claude, as measured using common benchmarks for grading language models ‘ abilities. Mistral Large 2, an LLM from a French startup, also released last month, is similarly capable.

Open source AI is being approached by the US government with caution but caution. The US Commerce Department’s National Telecommunications and Information Administration issued a report this week that “recommends the US government develop new capabilities to monitor for potential risks, but refrain from immediately limiting the widespread availability of open model weights in the largest AI systems.”

Not everyone is a fan of imposing restrictions on open models, however. Stella Biderman, director of EleutherAI, a community-driven open source AI project, says that the new technique may be elegant in theory but could prove tricky to enforce in practice. According to Biderman, the approach contradicts both the openness of AI and free software.

” I think this paper misunderstands the core issue”, Biderman says. The correct intervention is in the training data, not the trained model, if LLMs are concerned about generating information about WMDs.

Source credit

What's Hot

Sociologist’s new book explores ‘intersectional feminist criminology’

Sociologist’s new book explores ‘intersectional feminist criminology’

Why were so many Thai farmers among hostages held by Hamas?

A New Trick Could Block the Misuse of Open Source AI

New OpenAI Sora & Google Veo Competitor Focuses on Storytelling With Its Text-to-Video Tool

Trump/Musk Feud: Possible Impact on AI Regulation, Budget Bill, Government Contracts

Mistral’s New AI Tool Offers ‘Best-in-Class Coding Models’ to Enterprise Developers

Mistral’s New AI Tool Offers ‘Best-in-Class Coding Models’ to Enterprise Developers

Mistral’s New AI Tool Offers ‘Best-in-Class Coding Models’ to Enterprise Developers

Mistral’s New AI Tool Offers ‘Best-in-Class Coding Models’ to Enterprise Developers

Sociologist’s new book explores ‘intersectional feminist criminology’

Sociologist’s new book explores ‘intersectional feminist criminology’

Why were so many Thai farmers among hostages held by Hamas?

Cocaine in cement bags: Indian-origin Gurvinder Singh arrested in Australia for running international smuggling network

What it would take to convert a jet from Qatar into Air Force One to safely fly Trump

‘Have a nice day, DJT!’: Trump’s breakup with Musk devolves into a war of insults

Forgiveness or Fuhgeddaboudit: Should Donald Trump Bring Elon Musk Back?

Golden Visa in Dubai: Can new investors qualify instantly with Dh2 million property investment?

Musk-Miller feud: Katie Miller used to give instructions on Elon’s behalf, White House sent a mail to bar her

Green fuel: UAE retail giant LuLu now runs delivery fleet with biodiesel from used cooking oil

What's Hot

A New Trick Could Block the Misuse of Open Source AI

Keep Reading

Sign up for the Conservative Insider Newsletter.