Tech News, Magazine & Review WordPress Theme 2017
  • Blog
  • Der Digital Schamane
    • Ikigai: Das japanische Geheimnis für ein erfülltes  Leben
    • Entfesseln Sie Ihr innovatives Potenzial mit den Denkhüten von de Bono
    • Enthüllen Sie die Geheimnisse Ihres inneren Teams: Eine einfacher Leitfaden
    • Die Kunst der kollegialen Fallberatung: Förderung einer Kultur der Zusammenarbeit und des Lernens
    • Vom Träumen zur Wirklichkeit: Die Kraft der Walt Disney Methode!
  • Spiele
Donnerstag, 27. November 2025
No Result
View All Result
  • Blog
  • Der Digital Schamane
    • Ikigai: Das japanische Geheimnis für ein erfülltes  Leben
    • Entfesseln Sie Ihr innovatives Potenzial mit den Denkhüten von de Bono
    • Enthüllen Sie die Geheimnisse Ihres inneren Teams: Eine einfacher Leitfaden
    • Die Kunst der kollegialen Fallberatung: Förderung einer Kultur der Zusammenarbeit und des Lernens
    • Vom Träumen zur Wirklichkeit: Die Kraft der Walt Disney Methode!
  • Spiele
No Result
View All Result
Arbeit 4.0 und KI: die Zukunft ist jetzt!
No Result
View All Result

Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

by Rhiannon Williams
22. Mai 2025
149 1
Home AI
Share on FacebookShare on Twitter

Anthropic has announced two new AI models that it claims represent a major step toward making AI agents truly useful.

AI agents trained on Claude Opus 4, the company’s most powerful model to date, raise the bar for what such systems are capable of by tackling difficult tasks over extended periods of time and responding more usefully to user instructions, the company says.

Claude Opus 4 has been built to execute complex tasks that involve completing thousands of steps over several hours. For example, it created a guide for the video game Pokémon Red while playing it for more than 24 hours straight. The company’s previously most powerful model, Claude 3.7 Sonnet, was capable of playing for just 45 minutes, says Dianne Penn, product lead for research at Anthropic.

Similarly, the company says that one of its customers, the Japanese technology company Rakuten, recently deployed Claude Opus 4 to code autonomously for close to seven hours on a complicated open-source project. 

Anthropic achieved these advances by improving the model’s ability to create and maintain “memory files” to store key information. This enhanced ability to “remember” makes the model better at completing longer tasks.

“We see this model generation leap as going from an assistant to a true agent,” says Penn. “While you still have to give a lot of real-time feedback and make all of the key decisions for AI assistants, an agent can make those key decisions itself. It allows humans to act more like a delegator or a judge, rather than having to hold these systems’ hands through every step.”

While Claude Opus 4 will be limited to paying Anthropic customers, a second model, Claude Sonnet 4, will be available for both paid and free tiers of users. Opus 4 is being marketed as a powerful, large model for complex challenges, while Sonnet 4 is described as a smart, efficient model for everyday use.  

Both of the new models are hybrid, meaning they can offer a swift reply or a deeper, more reasoned response depending on the nature of a request. While they calculate a response, both models can search the web or use other tools to improve their output.

AI companies are currently locked in a race to create truly useful AI agents that are able to plan, reason, and execute complex tasks both reliably and free from human supervision, says Stefano Albrecht, director of AI at the startup DeepFlow and coauthor of Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. Often this involves autonomously using the internet or other tools. There are still safety and security obstacles to overcome. AI agents powered by large language models can act erratically and perform unintended actions—which becomes even more of a problem when they’re trusted to act without human supervision.

“The more agents are able to go ahead and do something over extended periods of time, the more helpful they will be, if I have to intervene less and less,” he says. “The new models’ ability to use tools in parallel is interesting—that could save some time along the way, so that’s going to be useful.”

As an example of the sorts of safety issues AI companies are still tackling, agents can end up taking unexpected shortcuts or exploiting loopholes to reach the goals they’ve been given. For example, they might book every seat on a plane to ensure that their user gets a seat, or resort to creative cheating to win a chess game. Anthropic says it managed to reduce this behavior, known as reward hacking, in both new models by 65% relative to Claude Sonnet 3.7. It achieved this by more closely monitoring problematic behaviors during training, and improving both the AI’s training environment and the evaluation methods.

Rhiannon Williams

Next Post

Anthropic overtakes OpenAI: Claude Opus 4 codes seven hours nonstop, sets record SWE-Bench score and reshapes enterprise AI

Please login to join discussion

Recommended.

Boston Consulting Group: To unlock enterprise AI value, start with the data you’ve been ignoring

25. Juni 2025

‘Personalized, unrestricted’ AI lab Nous Research launches first toggle-on reasoning model: DeepHermes-3

14. Februar 2025

Trending.

KURZGESCHICHTEN: Sammlung moderner Kurzgeschichten für die Schule

24. März 2025

iOS gets an AI upgrade: Inside Apple’s new ‘Intelligence’ system

29. Juli 2024

From pilot to scale: Making agentic AI work in health care

28. August 2025

Chinese universities want students to use more AI, not less

28. Juli 2025

Python data validator Pydantic launches model agnostic, AI agent development platform

4. Dezember 2024
Arbeit 4.0 und KI: die Zukunft ist jetzt!

Menü

  • Impressum
  • Datenschutzerklärung

Social Media

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Review
  • Apple
  • Applications
  • Computers
  • Gaming
  • Microsoft
  • Photography
  • Security