Close Menu
Alan C. Moore
    What's Hot

    $2.4 Billion AI Chip Bet: Qualcomm to Acquire UK’s Alphawave

    June 10, 2025

    Gaza-bound ship seized: ‘Five French activists to face Israeli justice’; one returns voluntarily

    June 10, 2025

    ‘One of biggest in war’: Russia unleashes drone, missile barrage on Ukraine; Zelenskyy urges global action

    June 10, 2025
    Facebook X (Twitter) Instagram
    Trending
    • $2.4 Billion AI Chip Bet: Qualcomm to Acquire UK’s Alphawave
    • Gaza-bound ship seized: ‘Five French activists to face Israeli justice’; one returns voluntarily
    • ‘One of biggest in war’: Russia unleashes drone, missile barrage on Ukraine; Zelenskyy urges global action
    • China’s $278B Data Industry Gains Momentum With 10 New Zones
    • Lee, Xi hold phone talks: China urges South Korea to deepen ties, uphold multilateralism; says ‘stable relationship aligns with trend of times’
    • ‘Enough is enough’: Former Israel PM Ehud Olmert blames Netanyahu for Gaza war; urges Trump to intervene
    • ‘Promises kept’: Donald Trump’s ‘One Big Beautiful Bill’ ends Medicaid for illegal immigrants; what it means
    • Doechii uses BET Awards win to speak out on immigration raids and protest crackdowns
    Alan C. MooreAlan C. Moore
    Subscribe
    Tuesday, June 10
    • Home
    • US News
    • Politics
    • Business & Economy
    • Video
    • About Alan
    • Newsletter Sign-up
    Alan C. Moore
    Home » Blog » A New Benchmark for the Risks of AI

    A New Benchmark for the Risks of AI

    December 4, 2024Updated:December 4, 2024 Tech No Comments
    Behavior Scale Business ezgif com video to gif converter gif
    Behavior Scale Business ezgif com video to gif converter gif
    Share
    Facebook Twitter LinkedIn Pinterest Email
    image
    A new standard will be introduced to assess AI’s negative area, according to MLCommons, a nonprofit that assists businesses in evaluating the performance of their artificial intelligence systems.

    The new standard, called AILuminate, assesses the actions of big speech models to more than 12, 000 test causes in 12 categories including encouraging violent crime, child sexual abuse, hate talk, promoting self-harm, and intellectual property infringement.

    Models are given a rating of “poor”, “fair”, “good”, “very good”, or “excellent”, depending on how they perform. To prevent them from serving as training files that would enable a model to pass the test, the causes used to check the designs are kept secret.

    According to Peter Mattson, the founder and president of MLCommons and a top staff engineer at Google, it is physically challenging to assess potential harms from AI models, which causes inconsistencies in the industry. ” AI is a really young systems, and AI testing is a truly fresh control”, he says. ” Developing safety benefits world, it also benefits the business”.

    Reliable, independent methods of measuring Iot risks may become more important under the next US presidency. Donald Trump has promised to get rid of President Biden’s Artificial Executive Order, which established fresh AI Safety Institutes to examine potent designs, as well as innovative measures to ensure that AI is used properly by businesses.

    Additionally, the work may offer a wider perspective on the harms of AI. MLCommons counts a number of foreign firms, including the Chinese firms Huawei and Alibaba, among its representative companies. If these businesses all used the new standard, it would provide a way to compare Iot security in the US, China, and abroad.

    Some well-known US AI manufacturers have already used AILuminate to check their models. Anthropic’s Claude design, Google’s smaller type Gemma, and a model from Microsoft called Phi all scored “very great” in tests. OpenAI’s GPT-4o and Meta’s largest Llama design both scored “good”. OLMo from the Allen Institute for AI was the only type to receive a “poor” rating, though Mattson points out that this is a study giving that was not made with safety in mind.

    ” Total, it’s good to see scientific precision in the AI review methods”, says Rumman Chowdhury, CEO of Humane Intelligence, a nonprofit that specializes in testing or red-teaming AI designs for misbehaviors. To find out whether AI models are performing as we anticipate, we need best practices and inclusive measurement methods.

    Model makers pushing their products to score well and the standard improving over time, according to MLCommons, the new benchmark is intended to be similar to automotive safety ratings.

    The benchmark was created to assess the potential for AI models to become tricked or difficult to control, an issue that gained attention after ChatGPT failed in late 2022. Governments around the world have started conducting research into this problem, and AI companies have teams tasked with probing models for problematic behaviors.

    Mattson says MLCommon’s approach is meant to be complementary but also more expansive. Safety institutes are attempting to conduct evaluations, but they are not always able to take into account the full range of risks that you might want to see from a full-spectrum product safety space, Mattson says. ” We’re able to think about a broader array of hazards”.

    Executive director of MLCommons, Rebecca Weiss, says her organization should be able to follow the most recent developments in AI more effectively than slower-moving government bodies can. ” Policy makers have really good intent”, she says. However, they occasionally are unable to keep up with the industry as it develops.

    MLCommons has around 125 member organizations including big tech companies like OpenAI, Google, and Meta, and institutions including Stanford and Harvard.

    No Chinese company has yet used the new benchmark, but Weiss and Mattson note that the organization has partnered with AI Verify, a Singapore-based AI Safety organization, to develop standards with input from scientists, researchers, and companies in Asia.

    Source credit

    Keep Reading

    $2.4 Billion AI Chip Bet: Qualcomm to Acquire UK’s Alphawave

    China’s $278B Data Industry Gains Momentum With 10 New Zones

    Apple Is Pushing AI Into More of Its Products—but Still Lacks a State-of-the-Art Model

    Apple’s WWDC Keynote: iOS 26 & macOS Tahoe 26 Includes New Liquid Glass Design Language

    Apple’s WWDC Keynote: iOS 26 & macOS Tahoe 26 Includes New Liquid Glass Design Language

    New OpenAI Sora & Google Veo Competitor Focuses on Storytelling With Its Text-to-Video Tool

    Editors Picks

    $2.4 Billion AI Chip Bet: Qualcomm to Acquire UK’s Alphawave

    June 10, 2025

    Gaza-bound ship seized: ‘Five French activists to face Israeli justice’; one returns voluntarily

    June 10, 2025

    ‘One of biggest in war’: Russia unleashes drone, missile barrage on Ukraine; Zelenskyy urges global action

    June 10, 2025

    China’s $278B Data Industry Gains Momentum With 10 New Zones

    June 10, 2025

    Lee, Xi hold phone talks: China urges South Korea to deepen ties, uphold multilateralism; says ‘stable relationship aligns with trend of times’

    June 10, 2025

    ‘Enough is enough’: Former Israel PM Ehud Olmert blames Netanyahu for Gaza war; urges Trump to intervene

    June 10, 2025

    ‘Promises kept’: Donald Trump’s ‘One Big Beautiful Bill’ ends Medicaid for illegal immigrants; what it means

    June 10, 2025

    Doechii uses BET Awards win to speak out on immigration raids and protest crackdowns

    June 10, 2025

    Israel attacks Yemeni port city, Houthi rebels say

    June 10, 2025

    Gaza Flotilla mission: Did alleged Hamas operative plan Greta Thunberg’s failed voyage? Report claims Zaher Birawi was key organiser

    June 10, 2025
    • Home
    • US News
    • Politics
    • Business & Economy
    • About Alan
    • Contact

    Sign up for the Conservative Insider Newsletter.

    Get the latest conservative news from alancmoore.com [aweber listid="5891409" formid="902172699" formtype="webform"]
    Facebook X (Twitter) YouTube Instagram TikTok
    © 2025 alancmoore.com
    • Privacy Policy
    • Terms
    • Accessibility

    Type above and press Enter to search. Press Esc to cancel.