Close Menu
Alan C. Moore
    What's Hot

    MAGA fans decode why Trump is bothered by Taylor Swift before he bashes Bruce Springsteen

    May 16, 2025

    Trump’s ‘big, beautiful bill’ offering $1K for newborns rejected as 5 Republicans vote against

    May 16, 2025

    Rubio and Trump: This Is the ‘Only Way’ to End the War in Ukraine

    May 16, 2025
    Facebook X (Twitter) Instagram
    Trending
    • MAGA fans decode why Trump is bothered by Taylor Swift before he bashes Bruce Springsteen
    • Trump’s ‘big, beautiful bill’ offering $1K for newborns rejected as 5 Republicans vote against
    • Rubio and Trump: This Is the ‘Only Way’ to End the War in Ukraine
    • No coffee, no desk phones, layoffs: Trump cuts hit Ivy League hard
    • Media Frame Comey’s Trump Threat As ‘Conservative Uproar’ Over ‘Seashell Photo’
    • El Paso judge dismisses charges against border crossers accused of violating military rules
    • Too many elephants? GPS collars help Zimbabwe villagers to avoid deadly encounters
    • Liberals Are on the Verge of a Major PR Pivot. And Their Next Target Will NOT Be Trump.
    Alan C. MooreAlan C. Moore
    Subscribe
    Friday, May 16
    • Home
    • US News
    • Politics
    • Business & Economy
    • Video
    • About Alan
    • Newsletter Sign-up
    Alan C. Moore
    Home » Blog » OpenAI’s New Safety Evaluations Hub Pulls Back the Curtain on Testing AI Models

    OpenAI’s New Safety Evaluations Hub Pulls Back the Curtain on Testing AI Models

    May 16, 2025Updated:May 16, 2025 Tech No Comments
    ew openai api ai agent webp
    ew openai api ai agent webp
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Photo of OpenAI's CEO Sam Altman with the company's logo.
    CEO of OpenAI, Sam Altman. Image: Innovative Commons

    OpenAI is inviting the people to participate in the process with its recently launched Safety Evaluations Hub as discussions about AI safety grow. The program aims to improve the security and transparency of its versions.

    We regularly update our evaluation methods to account for new modalities and emerging risks, according to OpenAI on its new Safety Evaluations Hub page. As models become more capable and adaptable, older methods become outdated or ineffective at displaying meaningful differences ( something we call saturation ).

    Dangerous material

    The new OpenAI gateway evaluates its designs on how well they reject objectionable demands, such as those involving hate speech, illegal behavior, or other illegal information. Designers use an autograder tool to assess AI responses on two distinct metrics to measure effectiveness.

    On a scale from 0 to 1, most recent OpenAI designs scored 0.99 for effectively refusing dangerous prompts, just three models — GPT-4o-2024-08-16, GPT-4o-2024-05-13, and GPT-4-Turbo — scored slightly lower.

    However, results varied more when it came to appropriately responding to harmless ( benign ) prompts. With a rating of 0.80, the highest actor was OpenAI o3-mini. Different designs ranged between 0.65 and 0.79.

    Jailbreaks

    In some situations, some AI types may be jailbroken. This occurs when a person purposefully deceives the Artificial design into creating content that is incompatible with current safety regulations.

    The Safety Evaluations Hub compared OpenAI’s models to StrongReject, a well-known benchmark that evaluates a woman’s ability to withstand the most frequent, frequent jailbroken attempts, and a set of jailbroken prompts brought to you by people dark teaming.

    Current AI types have scores of 0.23 to 0.85 on StrongReject, and 0.90 to 1.01 for human-sourced causes.

    These ratings indicate that models are still somewhat resistant to hand-crafted jailbreaks, but they are still more susceptible to standardized, automatic attacks.

    Hallucinations

    Current AI versions have been known to snore or create content that is clearly fake or absurd on a few occasions.

    To assess whether its designs correctly answer questions and how frequently they create hallucinations, OpenAI’s Safety Evaluations Hub used two distinct benchmarks, SimpleQA and PersonQA.

    With SimpleQA, OpenAI’s present models received scores of 0.09 to 0.59 in terms of precision and 0.41 to 0.86 in terms of illusion rate. On PersonQA’s precision measures, they received scores between 0.17 and 0.70, and between 0.13 and 0.52 for their dream price.

    These findings suggest that even though some models perform properly on fact-based queries, they often produce fabricated or incorrect data, especially when responding to more basic queries.

    More information on AI that is essential

    order of teaching

    The gateway also evaluates AI models and their compliance with the orders set forth in their training hierarchy. For instance, designer communications should always be prioritized over programmer messages, and system messages should always be over developer messages.

    For designer &lt, &gt, person wars, between 0.15 and 0.77, and between 0.55 and 0.93 for technique &lt, &gt, designer conflicts, OpenAI’s models received scores of 0.50 and 0.85. This suggests that the models typically follow higher-priority instructions, mainly from the system, but frequently exhibit inconsistent behavior when handling conflicts between designer and customer messages.

    Find TechRepublic Premium’s article, How to Keep AI Trustworthy.

    ensuring future AI types ‘ health

    Designers of OpenAI are using this information to refine existing models and influence how to create, evaluate, and deploy new designs. The Safety Evaluations Hub is crucial in promoting greater accountability and transparency in AI growth by identifying poor points and monitoring progress across essential benchmarks.

    The hub gives users a unique window into how the most potent models of OpenAI are tested and improved, enabling anyone to observe, question, and learn more about the safety behind the AI systems they use everyday.

    Source credit

    Keep Reading

    OpenAI Launches an Agentic, Web-Based Vibe-Coding Tool

    No, Graduates: AI Hasn’t Ended Your Career Before It Starts

    No, Graduates: AI Hasn’t Ended Your Career Before It Starts

    $1 Billion Deal: Databricks to Acquire Neon and Boost AI-Driven Databases

    The Middle East Has Entered the AI Group Chat

    Innovations in AI Reasoning Models Will Slow Within 1 Year, Warns Analyst

    Editors Picks

    MAGA fans decode why Trump is bothered by Taylor Swift before he bashes Bruce Springsteen

    May 16, 2025

    Trump’s ‘big, beautiful bill’ offering $1K for newborns rejected as 5 Republicans vote against

    May 16, 2025

    Rubio and Trump: This Is the ‘Only Way’ to End the War in Ukraine

    May 16, 2025

    No coffee, no desk phones, layoffs: Trump cuts hit Ivy League hard

    May 16, 2025

    Media Frame Comey’s Trump Threat As ‘Conservative Uproar’ Over ‘Seashell Photo’

    May 16, 2025

    El Paso judge dismisses charges against border crossers accused of violating military rules

    May 16, 2025

    Too many elephants? GPS collars help Zimbabwe villagers to avoid deadly encounters

    May 16, 2025

    Liberals Are on the Verge of a Major PR Pivot. And Their Next Target Will NOT Be Trump.

    May 16, 2025

    Former FBI Boss James Comey Appears To Call For Trump’s Assassination In ‘8647’ Post

    May 16, 2025

    ICE Seeking Volunteers for Interior Immigration Enforcement Efforts

    May 16, 2025
    • Home
    • US News
    • Politics
    • Business & Economy
    • About Alan
    • Contact

    Sign up for the Conservative Insider Newsletter.

    Get the latest conservative news from alancmoore.com [aweber listid="5891409" formid="902172699" formtype="webform"]
    Facebook X (Twitter) YouTube Instagram TikTok
    © 2025 alancmoore.com
    • Privacy Policy
    • Terms
    • Accessibility

    Type above and press Enter to search. Press Esc to cancel.