Artificial designs are, seemingly, getting better at lying on function.
Two recent studies — one published this week in the journal , PNAS , and the other next month in the journal , Patterns — show some disturbing findings about big language versions ( LLMs) and their ability to lie to or deceive people observers on function.
In the , PNAS , papers, German AI ethicist Thilo Hagendorff goes so far as to suggest that superior LLMs may be encouraged to evoke” Machiavellianism”, or purposeful and depraved manipulativeness, which” can cause misaligned false behavior”.
” GPT- 4, for example, exhibits dishonest behaviour in plain test scenarios 99.16 % of the time” , , the University of Stuttgart researcher writes, citing his personal experiments in quantifying several “maladaptive” traits in 10 unique LLMs, most of which are different types within OpenAI’s GPT family.
The Patterns study examined Meta’s Cicero model, which was described as a human-level champion in the political strategy board game” Diplomacy.” As the disparate research group — comprised of a physicist, a philosopher, and two AI safety experts — found, the LLM got ahead of its human competitors by,  , in a word, fibbing.
Led by Massachusetts Institute of Technology postdoctoral researcher Peter Park, that paper found that Cicero not only excels at deception, but seems to have learned how to lie the more it gets used — a state of affairs “much closer to explicit manipulation” than, say, AI’s propensity for hallucination, in which models confidently assert the wrong answers accidentally.
While Hagendorff points out in his more recent paper that the problem of LLM deception and lying is confounded by AI’s inability to have any kind of human-like “intention” in the human sense, the Patterns study contends that within the confines of Diplomacy, at least, Cicero seems to break its programmers ‘ promise that the model will “never intentionally backstab” its game allies.
The model, as the older paper’s authors observed, “engages in premeditated deception, breaks the deals to which it had agreed, and tells outright falsehoods”.
Put another way, as Park explained in a press release:” We found that Meta’s AI had learned to be a master of deception”.
” While Meta succeeded in training its AI to win in the game of Diplomacy”, the MIT physicist said in the school’s statement,” Meta failed to train its AI to win honestly”.
Following the publication of the study, Meta made a salient point in a statement to the New York Post that” the models our researchers created are solely trained to play the game Diplomacy.”
Diplomacy, which is well-known for explicitly allowing lying, has jokingly been referred to as a friendship-ending game because it encourages pulling one over on opponents, and if Cicero was trained solely on its rulebook, then it was essentially trained to lie.
No study has demonstrated that AI models lie about their own volition because they have been trained or jailbroken, contrary to the conventional wisdom.
That’s good news for those concerned about AI developing sentience, but very bad news if you’re worried about someone creating an LLM with a goal of mass manipulation.
More on bad AI: A news website claims to be using it to” crank out” racist articles written in a very responsible manner.
Share This Article