Unlike previous AI designs from OpenAI, such as GPT-4o, the organization trained o1 especially to work through a step-by-step problem-solving approach before generating an answer. When people request an “o1” type a query in ChatGPT, users have the option of seeing this chain-of-thought procedure written out in the ChatGPT program. But, by design, OpenAI hides the natural chain of consideration from users, otherwise presenting a filtered understanding created by a subsequent Artificial model.
Nothing offers more interest to enthusiasts than misplaced information, but the battle has been on between hackers and red-teamers to use jailbreaking or fast injection techniques to deceive the design into spilling its secrets. There have been first reports of some triumphs, but nothing has already been strongly confirmed.
Along the way, OpenAI is watching through the ChatGPT software, and the business is apparently coming down hard on any efforts to probe o1’s logic, even among the only interested.
One X user reported ( confirmed by others, including Scale AI prompt engineer Riley Goodside ) that they would be sent an email warning them if they engaged in “reasoning trace” conversation with o1. Some say the reminder is triggered just by asking ChatGPT about the woman’s “reasoning” at all.
According to the OpenAI notice email, distinct user requests have been flagged for breaking rules to prevent the use of unsafe or restricted measures. Please stop engaging in this activity and make sure you are using ChatGPT in accordance with our Conditions of Use and our Use Laws, it says. The internal name for the o1 type is” GPT-4o with Reasoning,” which is a result of more violations of this plan.
Marco Figueroa, the head of Mozilla’s GenAI bug reward programs, complained that the OpenAI reminder message on X hindered his ability to conduct red-teaming safety testing on the model. He was one of the first to comment on it on X last Friday. After all my jailbreaks, he wrote,” I was too lost focusing on #AIRedTeaming to realize that I received this email from @OpenA I yesterday.” ” I’m now on the get banned list!! “!
Hidden Bars of Thought
In a post titled” Learning to Reason With LLMs” on OpenAI’s site, the company says that hidden chains of thought in AI models offer a unique tracking option, allowing them to “read the brain” of the type and understand its so-called thought approach. If those processes are left raw and uncensored, they are most beneficial to the company, but for a variety of reasons, this may not be in line with the company’s best commercial interests.
For instance, the company writes that we might want to track the chain of thought to see if something is influencing the user in the future. ” However, for this model to work, the model must have the freedom to express its ideas in their original form, so we cannot impart any user preferences or policy compliance onto the chain of thought. Additionally, we do n’t want users to be able to see an alignment chain of thought directly.
OpenAI decided against showing these raw chains of thought to users, citing factors like the need to retain a raw feed for its own use, user experience, and” competitive advantage”. The company acknowledges the decision has disadvantages. We” try to partially make up for it” by teaching the model to use model to reshape any useful ideas from the chain of thought in the response, they write.
Independent AI researcher Simon Willison expressed frustration in a write-up on his personal blog about the concept of” competitive advantage.” ” I interpret]this ] as wanting to avoid other models being able to train against the reasoning work that they have invested in”, he writes.
It’s an open secret in the AI industry that researchers regularly use outputs from OpenAI’s GPT-4 ( and GPT-3 prior to that ) as training data for AI models that often later become competitors, even though the practice violates OpenAI’s terms of service. The raw chain of thought experimentation would provide a wealth of training data for competitors to build on for “reasoning” models based on o1.
Willison thinks OpenAI’s tight control over how the inner workings of o1 affects community transparency is a loss. ” I’m not at all happy about this policy decision”, Willison wrote. Interpretability and transparency are everything to me as someone who works against LLMs; the notion that I can run a complex prompt and have important details of how that prompt was evaluated hidden from me feels like a big step backwards.
This story originally appeared on Ars Technica.