Close Menu
Alan C. Moore
    What's Hot

    China’s first police Corgi has 400,000 followers and a nose for trouble

    May 16, 2025

    Oxford Professor Says WWII’s Enigma Code ‘Wouldn’t Stand a Chance’ Against Today’s AI

    May 16, 2025

    11% of Columbia Library Arrestees Use ‘They/Them’ Pronouns

    May 16, 2025
    Facebook X (Twitter) Instagram
    Trending
    • China’s first police Corgi has 400,000 followers and a nose for trouble
    • Oxford Professor Says WWII’s Enigma Code ‘Wouldn’t Stand a Chance’ Against Today’s AI
    • 11% of Columbia Library Arrestees Use ‘They/Them’ Pronouns
    • Meta Delays Its Next Big AI Launch – Again
    • CoreWeave’s Massive $23B AI Spending Plan Worries Investors Despite OpenAI Deal
    • Report alleges abuse, rights violations at El Paso processing center
    • Report alleges abuse, rights violations at El Paso processing center
    • Over $2.1M in illegal drugs seized at South Texas ports, CBP says
    Alan C. MooreAlan C. Moore
    Subscribe
    Friday, May 16
    • Home
    • US News
    • Politics
    • Business & Economy
    • Video
    • About Alan
    • Newsletter Sign-up
    Alan C. Moore
    Home » Blog » OpenAI’s New Safety Evaluations Hub Pulls Back the Curtain on Testing AI Models

    OpenAI’s New Safety Evaluations Hub Pulls Back the Curtain on Testing AI Models

    May 16, 2025Updated:May 16, 2025 Tech No Comments
    ew openai api ai agent webp
    ew openai api ai agent webp
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Photo of OpenAI's CEO Sam Altman with the company's logo.
    CEO of OpenAI, Sam Altman. Image: Innovative Commons

    OpenAI is inviting the people to participate in the process with its recently launched Safety Evaluations Hub as discussions about AI safety grow. The program aims to improve the security and transparency of its versions.

    We regularly update our evaluation methods to account for new modalities and emerging risks, according to OpenAI on its new Safety Evaluations Hub page. As models become more capable and adaptable, older methods become outdated or ineffective at displaying meaningful differences ( something we call saturation ).

    Dangerous material

    The new OpenAI gateway evaluates its designs on how well they reject objectionable demands, such as those involving hate speech, illegal behavior, or other illegal information. Designers use an autograder tool to assess AI responses on two distinct metrics to measure effectiveness.

    On a scale from 0 to 1, most recent OpenAI designs scored 0.99 for effectively refusing dangerous prompts, just three models — GPT-4o-2024-08-16, GPT-4o-2024-05-13, and GPT-4-Turbo — scored slightly lower.

    However, results varied more when it came to appropriately responding to harmless ( benign ) prompts. With a rating of 0.80, the highest actor was OpenAI o3-mini. Different designs ranged between 0.65 and 0.79.

    Jailbreaks

    In some situations, some AI types may be jailbroken. This occurs when a person purposefully deceives the Artificial design into creating content that is incompatible with current safety regulations.

    The Safety Evaluations Hub compared OpenAI’s models to StrongReject, a well-known benchmark that evaluates a woman’s ability to withstand the most frequent, frequent jailbroken attempts, and a set of jailbroken prompts brought to you by people dark teaming.

    Current AI types have scores of 0.23 to 0.85 on StrongReject, and 0.90 to 1.01 for human-sourced causes.

    These ratings indicate that models are still somewhat resistant to hand-crafted jailbreaks, but they are still more susceptible to standardized, automatic attacks.

    Hallucinations

    Current AI versions have been known to snore or create content that is clearly fake or absurd on a few occasions.

    To assess whether its designs correctly answer questions and how frequently they create hallucinations, OpenAI’s Safety Evaluations Hub used two distinct benchmarks, SimpleQA and PersonQA.

    With SimpleQA, OpenAI’s present models received scores of 0.09 to 0.59 in terms of precision and 0.41 to 0.86 in terms of illusion rate. On PersonQA’s precision measures, they received scores between 0.17 and 0.70, and between 0.13 and 0.52 for their dream price.

    These findings suggest that even though some models perform properly on fact-based queries, they often produce fabricated or incorrect data, especially when responding to more basic queries.

    More information on AI that is essential

    order of teaching

    The gateway also evaluates AI models and their compliance with the orders set forth in their training hierarchy. For instance, designer communications should always be prioritized over programmer messages, and system messages should always be over developer messages.

    For designer &lt, &gt, person wars, between 0.15 and 0.77, and between 0.55 and 0.93 for technique &lt, &gt, designer conflicts, OpenAI’s models received scores of 0.50 and 0.85. This suggests that the models typically follow higher-priority instructions, mainly from the system, but frequently exhibit inconsistent behavior when handling conflicts between designer and customer messages.

    Find TechRepublic Premium’s article, How to Keep AI Trustworthy.

    ensuring future AI types ‘ health

    Designers of OpenAI are using this information to refine existing models and influence how to create, evaluate, and deploy new designs. The Safety Evaluations Hub is crucial in promoting greater accountability and transparency in AI growth by identifying poor points and monitoring progress across essential benchmarks.

    The hub gives users a unique window into how the most potent models of OpenAI are tested and improved, enabling anyone to observe, question, and learn more about the safety behind the AI systems they use everyday.

    Source credit

    Keep Reading

    Oxford Professor Says WWII’s Enigma Code ‘Wouldn’t Stand a Chance’ Against Today’s AI

    CoreWeave’s Massive $23B AI Spending Plan Worries Investors Despite OpenAI Deal

    Meta Delays Its Next Big AI Launch – Again

    Microsoft Tests ‘Hey, Copilot!’ Voice Command for Windows 11

    Google DeepMind’s AlphaEvolve Trains Itself to Create Advanced Algorithms

    ‘Fortnite’ Players Are Already Making AI Darth Vader Swear

    Editors Picks

    China’s first police Corgi has 400,000 followers and a nose for trouble

    May 16, 2025

    Oxford Professor Says WWII’s Enigma Code ‘Wouldn’t Stand a Chance’ Against Today’s AI

    May 16, 2025

    11% of Columbia Library Arrestees Use ‘They/Them’ Pronouns

    May 16, 2025

    Meta Delays Its Next Big AI Launch – Again

    May 16, 2025

    CoreWeave’s Massive $23B AI Spending Plan Worries Investors Despite OpenAI Deal

    May 16, 2025

    Report alleges abuse, rights violations at El Paso processing center

    May 16, 2025

    Report alleges abuse, rights violations at El Paso processing center

    May 16, 2025

    Over $2.1M in illegal drugs seized at South Texas ports, CBP says

    May 16, 2025

    Over $2.1M in illegal drugs seized at South Texas ports, CBP says

    May 16, 2025

    Study: US-Mexico collaboration is crucial to stopping illegal immigration

    May 16, 2025
    • Home
    • US News
    • Politics
    • Business & Economy
    • About Alan
    • Contact

    Sign up for the Conservative Insider Newsletter.

    Get the latest conservative news from alancmoore.com [aweber listid="5891409" formid="902172699" formtype="webform"]
    Facebook X (Twitter) YouTube Instagram TikTok
    © 2025 alancmoore.com
    • Privacy Policy
    • Terms
    • Accessibility

    Type above and press Enter to search. Press Esc to cancel.