Anthropic has unveiled a big update to its Claude AI types, including the new” Computer Use” have. Developers can use the upgraded Claude 3.5 Sonnet to manage desktop apps, proceed cursors, push buttons, and type text, basically imitating a PC user.
Instead of creating specific programs to aid Claude in particular tasks, the company is teaching it basic computer skills, allowing it to use a range of common tools and software programs created specifically for people, the company said in a blog post.
Use word prompts and online data to complete this form and move the cursor to start a web browser, using the Anthropic example. The AI leader’s first AI unit that can browse the web is.
The release works by using the technology available to analyze screenshots of what the customer is viewing and determine how many pixels to walk a pointer vertically or horizontally to the desired location. It can take up to hundreds of repeat steps to finish a order, and it will automatically correct itself and try a step in the event of a problem.
The Computer Use API, available today in public beta, finally aims to allow developers to automate repetitive techniques, test program, and do open-ended tasks. Replit is currently testing using it for app development for its Replit Agent product in order to assess functionality.
” Analyzing AIs to interact with computer software in the same way that people do will open up a wide range of applications that are n’t currently available for the current generation of AI assistants,” Anthropic wrote in a blog post.
Claude’s Computer Use is still very error-prone
Anthropic admits that the feature is not perfect, it still ca n’t effectively handle scrolling, dragging, or zooming. Only 46 % of the time was it successful in an examination designed to assess its ability to text flights. However, this is a significant progress over the 36%-per-cent generation that preceded it.
Because Claude relies on pictures rather than a constant picture stream, it can lose short-lived activities or alerts. The experts acknowledge that one coding presentation led to the stoppage of the program and the search for Yellowstone National Park photos.
It scored 14.9 % on OSWorld, a system for evaluating a woman’s ability to perform as humans do, for screenshot-based things. This is a far cry from human-level skill, thought to be between 70 % and 75 %, but it is nearly double that of the next best AI system. Additionally, anthropometric hopes to use engineer feedback to enhance this capability.
There are security features that come with computer use.
According to the Anthropic scientists, a number of purposeful steps were taken to reduce the potential risk associated with computer use were taken. For protection and security, it does not teach on user-submitted information, including pictures it processes, nor could it access the internet during education.
One of the main flaws discovered is fast injection attacks, a form of “jailbreaking,” in which harmful instructions may cause the AI to act unanticipated.
Studies from the U. K. AI Safety Institute found that hack attacks had “enable clear and malignant multi-step representative behavior” in models without quite Computer Use capabilities, such as GPT-4o. A different study found that 20 % of the time, Generative AI hack problems are successful.
The Trust and Safety teams set up systems to detect and avoid such problems in Claude Sonnet 3. 5, especially given that Claude can view pictures with potentially harmful material.
However, the developers anticipated the potential for customers to abuse Claude’s computer skills. As a result, they created” classifier” and monitoring devices that detect when hazardous activities, such as email, misinformation, or false habits, may be occurring. To prevent social threats, it is also unable to post on social media or communicate with government sites.
Joint pre-deployment testing was conducted by both the U. S. and U. K. Safety Institutes, and Claude 3.5 Sonnet remains at AI Safety Level 2, meaning it does n’t pose significant risks that require more stringent safety measures than the existing.
Notice: OpenAI and Anthropic Sign Deals With U. S. AI Safety Institute, Handing Over Frontier Models For Testing
Claude 3.5 Sonnet is better at programming than its predecessor
Claude 3.5 Sonnet offers significant improvements in programming and resource use, but at the same price and speed as its predecessor, besides the computer employ beta. The new model improves its performance on SWE-bench Verified, a coding benchmark, from 33.4 % to 49 %, outpacing even reasoning models like OpenAI o1-preview.
Growing numbers of businesses are utilizing Generative AI to protocol. Nevertheless, the technology is not great in this area. AI-generated password has been known to cause interruptions, and security officials are considering banning the technology’s usage in software development.
SEE: When AI Misses the Mark: Why Tech Buyers Face Project Failures
Users of Claude 3.5 Sonnet have seen the improvements in action, according to Anthropic. When GITLab tested it for DevSecOps tasks, it demonstrated that it delivered no additional latency and up to 10 % stronger reasoning. The AI lab Cognition also reported improvements in its coding, planning, and problem-solving over the previous version.
Claude 3.5 Sonnet is available today through Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. A version of Claude apps that does n’t use computers is being released.
Claude 3.5 Haiku is cheaper but just as effective
Anthropic also launched Claude 3.5 Haiku, an upgraded version of the least expensive , Claude model. Haiku is useful for user-facing applications and creating personalized experiences from data because it provides quicker responses and improved instruction accuracy and tool use.
Haiku delivers the same level of performance as the larger Claude 3 Opus model for the same price and speed as the previous generation. It also outperforms the original Claude 3.5 Sonnet and GPT-4o on SWE-bench Verified, with a score of 40.6 %.
Claude 3.5 Haiku will be rolled out next month as a text-prompt-only model. Future image inputs will be possible.
The shift to AI agents is widespread.
The model is placed in the hands of AI agents, tools that can autonomously perform complex tasks thanks to Claude 3.5 Sonnet’s ability to use computers.
” Anthropic’s choice of the term ‘ computer use ‘ instead of’ agents ‘ makes this technology more approachable to regular users”, Yiannis Antoniou, head of Data, Analytics, and AI at technology consultancy Lab49, told TechRepublic in an email.
Agents are becoming the essential tools in businesses as they take the place of AI copilots, which are tools designed to assist and suggest suggestions to the user rather than act independently. According to the Financial Times, Microsoft, Workday, and Salesforce have all recently placed agents at the core of their AI plans.
In September, Salesforce unveiled Agentforce, a platform for deploying generative AI in areas such as customer support, service, sales, or marketing.
At this week’s SXSW Festival in Australia, IBM’s vice president of product management for its AI platform Armand Ruiz stated that the newest big development in AI will usher in an “agentic era,” in which specialized AI agents work with humans to improve organizational efficiency.
There is still a long way to go before AI can make it possible for us to perform these routine tasks and do it in a trustworthy way, and then do it in a way that is accessible for scale, explanation, and monitoring. ” But we’re going to get there, and we’re going to get there faster than we think”.
AI developers might even go as far as to say that no human intervention is required in their own creation. Meta announced last week that it would be releasing a” Self-Taught Evaluator” AI model that would be used to independently assess both its own performance and that of other AI systems, demonstrating the potential for models to learn from their own mistakes.