Close Menu
Alan C. Moore
    What's Hot

    Defence Secretary Hegseth, bedevilled by leaks, orders more restrictions on press at Pentagon

    May 24, 2025

    North Carolina’s high court says elections board shift can continue while appeals carry on

    May 24, 2025

    Did Kamala Harris lash out at Anderson Cooper after Biden debate grilling? ‘Original Sin’ book drops new details

    May 24, 2025
    Facebook X (Twitter) Instagram
    Trending
    • Defence Secretary Hegseth, bedevilled by leaks, orders more restrictions on press at Pentagon
    • North Carolina’s high court says elections board shift can continue while appeals carry on
    • Did Kamala Harris lash out at Anderson Cooper after Biden debate grilling? ‘Original Sin’ book drops new details
    • Burnt cars, shattered homes: Israeli settlers attack Palestinians in Bruqin
    • President Donald Trump is set to give the commencement address to West Point graduates
    • Runway lights weren’t working as pilot tried to land at foggy San Diego airport before fatal crash
    • ‘If I were single…’: Michelle Obama jokes about Brian Chesky’s love life amid ongoing divorce rumors
    • Doubt cast on claim of ‘hints’ of life on faraway planet
    Alan C. MooreAlan C. Moore
    Subscribe
    Saturday, May 24
    • Home
    • US News
    • Politics
    • Business & Economy
    • Video
    • About Alan
    • Newsletter Sign-up
    Alan C. Moore
    Home » Blog » Meet The AI Agent With Multiple Personalities

    Meet The AI Agent With Multiple Personalities

    April 16, 2025Updated:April 16, 2025 Tech No Comments
    AI Lab Multiple Personalities AI Business jpg
    AI Lab Multiple Personalities AI Business jpg
    Share
    Facebook Twitter LinkedIn Pinterest Email
    image

    In the coming years, agents are widely expected to take over more and more chores on behalf of humans, including using computers and smartphones. For now, though, they’re too error prone to be much use.

    A new agent called S2, created by the startup Simular AI, combines frontier models with models specialized for using computers. The agent achieves state-of-the-art performance on tasks like using apps and manipulating files—and suggests that turning to different models in different situations may help agents advance.

    “Computer-using agents are different from large language models and different from coding,” says Ang Li, cofounder and CEO of Simular. “It’s a different type of problem.”

    In Simular’s approach, a powerful general-purpose AI model, like OpenAI’s GPT-4o or Anthropic’s Claude 3.7, is used to reason about how best to complete the task at hand—while smaller open source models step in for tasks like interpreting web pages.

    Li, who was a researcher at Google DeepMind before founding Simular in 2023, explains that large language models excel at planning but aren’t as good at recognizing the elements of a graphical user interface.

    S2 is designed to learn from experience with an external memory module that records actions and user feedback and uses those recordings to improve future actions.

    On particularly complex tasks, S2 performs better than any other model on OSWorld, a benchmark that measures an agent’s ability to use a computer operating system.

    For example, S2 can complete 34.5 percent of tasks that involve 50 steps, beating OpenAI’s Operator, which can complete 32 percent. Similarly, S2 scores 50 percent on AndroidWorld, a benchmark for smartphone-using agents, while the next best agent scores 46 percent.

    Victor Zhong, a computer scientist at the University of Waterloo in Canada and one of the creators of OSWorld, believes that future big AI models may incorporate training data that helps them understand the visual world and make sense of graphical user interfaces.

    “This will help agents navigate GUIs with much higher precision,” Zhong says. “I think in the meantime, before such fundamental breakthroughs, state-of-the-art systems will resemble Simular in that they combine multiple models to patch the limitations of single models.”

    To prepare for this column, I used Simular to book flights and scour Amazon for deals, and it seemed better than some of the open source agents I tried last year, including AutoGen and vimGPT.

    But even the smartest AI agents are, it seems, still troubled by edge cases and occasionally exhibit odd behavior. In one instance, when I asked S2 to help find contact information for the researchers behind OSWorld, the agent got stuck in a loop hopping between the project page and the login for OSWorld’s Discord.

    OSWorld’s benchmarks show why agents remain more hype than reality for now. While humans can complete 72 percent of OSWorld tasks, agents are foiled 38 percent of the time on complex tasks. That said, when the benchmark was introduced in April 2024, the best agent could complete only 12 percent of the tasks.

    Zhong says that the amount of training data available may limit how good agents can become.

    Perhaps one solution is to add human intelligence to the mix. While looking into Simular, I discovered a research project that shows how effective it can be to blend human skills with those of an AI agent.

    CowPilot, a Chrome plugin developed by a team at Carnegie Mellon University, allows a human to intervene if an AI agent gets stuck doing things. With CowPilot, I can step in and click or type if the agent seems to be dithering.

    Jeffrey Bigham, a professor at CMU who oversaw the project, which was developed by his student, Faria Huq, says the idea of having a human work with an agent “is almost so obvious that it’s hard to believe it’s not the way most people are thinking about it.”

    Most interestingly, Bigham and Huq say that a human and agent working together can perform more tasks than either party working alone. In a limited test, the human-agent combo completed 95 percent of the jobs it was given, while requiring humans to perform only 15 percent of the total steps.

    “Web pages are often hard to use, especially if you’re not familiar with a particular page, and sometimes the agent can help you find a good path through that would have taken you longer to figure out on your own,” Bigham adds.

    I don’t know about you, but I like the idea of an agent that makes me more productive and less error prone.

    Source credit

    Keep Reading

    Norton’s AI-First Neo Browser Lets You ‘Focus On What Really Matters’

    Stargate’s First AI Data Center in Texas: 10 Things You Need to Know

    Open Source AI: Cost-Effective and Widely Used, Says Meta-Backed Report

    Inside Anthropic’s First Developer Day, Where AI Agents Took Center Stage

    Inside Anthropic’s First Developer Day, Where AI Agents Took Center Stage

    Let’s Talk About ChatGPT and Cheating in the Classroom

    Editors Picks

    Defence Secretary Hegseth, bedevilled by leaks, orders more restrictions on press at Pentagon

    May 24, 2025

    North Carolina’s high court says elections board shift can continue while appeals carry on

    May 24, 2025

    Did Kamala Harris lash out at Anderson Cooper after Biden debate grilling? ‘Original Sin’ book drops new details

    May 24, 2025

    Burnt cars, shattered homes: Israeli settlers attack Palestinians in Bruqin

    May 24, 2025

    President Donald Trump is set to give the commencement address to West Point graduates

    May 24, 2025

    Runway lights weren’t working as pilot tried to land at foggy San Diego airport before fatal crash

    May 24, 2025

    ‘If I were single…’: Michelle Obama jokes about Brian Chesky’s love life amid ongoing divorce rumors

    May 24, 2025

    Doubt cast on claim of ‘hints’ of life on faraway planet

    May 24, 2025

    Trump administration finds Columbia University violated civil rights law

    May 23, 2025

    Watch: Russia hits Ukraine’s capital with drones and missiles, eight injured

    May 23, 2025
    • Home
    • US News
    • Politics
    • Business & Economy
    • About Alan
    • Contact

    Sign up for the Conservative Insider Newsletter.

    Get the latest conservative news from alancmoore.com [aweber listid="5891409" formid="902172699" formtype="webform"]
    Facebook X (Twitter) YouTube Instagram TikTok
    © 2025 alancmoore.com
    • Privacy Policy
    • Terms
    • Accessibility

    Type above and press Enter to search. Press Esc to cancel.