Close Menu
Alan C. Moore
    What's Hot

    Who is Brad Bondi? Pam Bondi’s brother routed in DC Bar race; Seltzer wins with 90% margin

    June 9, 2025

    From red flags to American flags: Trump keeps Tesla, Musk keeps quiet; ‘big, beautiful’ feud over?

    June 9, 2025

    ‘If I were him … ‘: Donald Trump on planning to speak to Elon Musk on phone; watch video

    June 9, 2025
    Facebook X (Twitter) Instagram
    Trending
    • Who is Brad Bondi? Pam Bondi’s brother routed in DC Bar race; Seltzer wins with 90% margin
    • From red flags to American flags: Trump keeps Tesla, Musk keeps quiet; ‘big, beautiful’ feud over?
    • ‘If I were him … ‘: Donald Trump on planning to speak to Elon Musk on phone; watch video
    • L.A. Braces for More Riots, Declares City-Wide Tactical Alert
    • Los Angeles protests: Donald Trump claims Gavin Newsom ‘committed a crime’ running for governor; California leader hits back, calls Trump a ‘dictator’
    • Trump says he hopes allegations Elon Musk brought drugs into White House aren’t true
    • Trump orders 2,000 more National Guard members to LA after Marines deployed
    • Israel reveals tunnel under Gaza hospital where body of Sinwar’s brother was found
    Alan C. MooreAlan C. Moore
    Subscribe
    Monday, June 9
    • Home
    • US News
    • Politics
    • Business & Economy
    • Video
    • About Alan
    • Newsletter Sign-up
    Alan C. Moore
    Home » Blog » A New Benchmark for the Risks of AI

    A New Benchmark for the Risks of AI

    December 4, 2024Updated:December 4, 2024 Tech No Comments
    Behavior Scale Business ezgif com video to gif converter gif
    Behavior Scale Business ezgif com video to gif converter gif
    Share
    Facebook Twitter LinkedIn Pinterest Email
    image
    A new standard will be introduced to assess AI’s negative area, according to MLCommons, a nonprofit that assists businesses in evaluating the performance of their artificial intelligence systems.

    The new standard, called AILuminate, assesses the actions of big speech models to more than 12, 000 test causes in 12 categories including encouraging violent crime, child sexual abuse, hate talk, promoting self-harm, and intellectual property infringement.

    Models are given a rating of “poor”, “fair”, “good”, “very good”, or “excellent”, depending on how they perform. To prevent them from serving as training files that would enable a model to pass the test, the causes used to check the designs are kept secret.

    According to Peter Mattson, the founder and president of MLCommons and a top staff engineer at Google, it is physically challenging to assess potential harms from AI models, which causes inconsistencies in the industry. ” AI is a really young systems, and AI testing is a truly fresh control”, he says. ” Developing safety benefits world, it also benefits the business”.

    Reliable, independent methods of measuring Iot risks may become more important under the next US presidency. Donald Trump has promised to get rid of President Biden’s Artificial Executive Order, which established fresh AI Safety Institutes to examine potent designs, as well as innovative measures to ensure that AI is used properly by businesses.

    Additionally, the work may offer a wider perspective on the harms of AI. MLCommons counts a number of foreign firms, including the Chinese firms Huawei and Alibaba, among its representative companies. If these businesses all used the new standard, it would provide a way to compare Iot security in the US, China, and abroad.

    Some well-known US AI manufacturers have already used AILuminate to check their models. Anthropic’s Claude design, Google’s smaller type Gemma, and a model from Microsoft called Phi all scored “very great” in tests. OpenAI’s GPT-4o and Meta’s largest Llama design both scored “good”. OLMo from the Allen Institute for AI was the only type to receive a “poor” rating, though Mattson points out that this is a study giving that was not made with safety in mind.

    ” Total, it’s good to see scientific precision in the AI review methods”, says Rumman Chowdhury, CEO of Humane Intelligence, a nonprofit that specializes in testing or red-teaming AI designs for misbehaviors. To find out whether AI models are performing as we anticipate, we need best practices and inclusive measurement methods.

    Model makers pushing their products to score well and the standard improving over time, according to MLCommons, the new benchmark is intended to be similar to automotive safety ratings.

    The benchmark was created to assess the potential for AI models to become tricked or difficult to control, an issue that gained attention after ChatGPT failed in late 2022. Governments around the world have started conducting research into this problem, and AI companies have teams tasked with probing models for problematic behaviors.

    Mattson says MLCommon’s approach is meant to be complementary but also more expansive. Safety institutes are attempting to conduct evaluations, but they are not always able to take into account the full range of risks that you might want to see from a full-spectrum product safety space, Mattson says. ” We’re able to think about a broader array of hazards”.

    Executive director of MLCommons, Rebecca Weiss, says her organization should be able to follow the most recent developments in AI more effectively than slower-moving government bodies can. ” Policy makers have really good intent”, she says. However, they occasionally are unable to keep up with the industry as it develops.

    MLCommons has around 125 member organizations including big tech companies like OpenAI, Google, and Meta, and institutions including Stanford and Harvard.

    No Chinese company has yet used the new benchmark, but Weiss and Mattson note that the organization has partnered with AI Verify, a Singapore-based AI Safety organization, to develop standards with input from scientists, researchers, and companies in Asia.

    Source credit

    Keep Reading

    Apple Is Pushing AI Into More of Its Products—but Still Lacks a State-of-the-Art Model

    Apple’s WWDC Keynote: iOS 26 & macOS Tahoe 26 Includes New Liquid Glass Design Language

    Apple’s WWDC Keynote: iOS 26 & macOS Tahoe 26 Includes New Liquid Glass Design Language

    New OpenAI Sora & Google Veo Competitor Focuses on Storytelling With Its Text-to-Video Tool

    Trump/Musk Feud: Possible Impact on AI Regulation, Budget Bill, Government Contracts

    Mistral’s New AI Tool Offers ‘Best-in-Class Coding Models’ to Enterprise Developers

    Editors Picks

    Who is Brad Bondi? Pam Bondi’s brother routed in DC Bar race; Seltzer wins with 90% margin

    June 9, 2025

    From red flags to American flags: Trump keeps Tesla, Musk keeps quiet; ‘big, beautiful’ feud over?

    June 9, 2025

    ‘If I were him … ‘: Donald Trump on planning to speak to Elon Musk on phone; watch video

    June 9, 2025

    L.A. Braces for More Riots, Declares City-Wide Tactical Alert

    June 9, 2025

    Los Angeles protests: Donald Trump claims Gavin Newsom ‘committed a crime’ running for governor; California leader hits back, calls Trump a ‘dictator’

    June 9, 2025

    Trump says he hopes allegations Elon Musk brought drugs into White House aren’t true

    June 9, 2025

    Trump orders 2,000 more National Guard members to LA after Marines deployed

    June 9, 2025

    Israel reveals tunnel under Gaza hospital where body of Sinwar’s brother was found

    June 9, 2025

    Frederick Forsyth Death: 10 remarkable facts about the master storyteller—that seem fake

    June 9, 2025

    Las Vegas Strip shooting: Two people shot dead; police believe attack was targeted

    June 9, 2025
    • Home
    • US News
    • Politics
    • Business & Economy
    • About Alan
    • Contact

    Sign up for the Conservative Insider Newsletter.

    Get the latest conservative news from alancmoore.com [aweber listid="5891409" formid="902172699" formtype="webform"]
    Facebook X (Twitter) YouTube Instagram TikTok
    © 2025 alancmoore.com
    • Privacy Policy
    • Terms
    • Accessibility

    Type above and press Enter to search. Press Esc to cancel.