Research
LLM Agents Struggle With Real-World Ambiguity and Complex Tasks
New research highlights the significant hurdles large language model agents face when confronting underspecified instructions and intricate operational problems.
LLM Agents Show Vulnerabilities in Critical Systems Testing
New research reveals that large language model agents, intended for safety-critical roles, are susceptible to multi-turn attacks and bias propagation.
New AI Research Boosts LLM Learning for Complex, Long-Term Tasks
New research shows how large language models can learn to navigate dynamic environments, assess information, and even prove theorems, moving beyond simple fact retrieval.
New AI Research Explores LLM Reasoning, Economic Analysis, and Diffusion Models
Recent arXiv papers shed light on how large language models are evolving, from grounding economic forecasts in data to tackling complex combinatorial problems and exploring new architectural designs.
LLM Agents Tackle Scientific Data Chaos and Nuclear Plant Safety
New research shows large language model agents are being tested for critical roles, from organizing messy scientific data to overseeing nuclear power plant operations.
LLM Agents Tackle Scientific Data and Nuclear Safety
New research shows large language models are being pushed into critical roles, from standardizing complex scientific data to operating simulated nuclear power plants, highlighting both their promise and their risks.
New AI Research Improves LLM Learning with Advanced Reinforcement Techniques
Researchers are exploring new ways to train large language models, moving beyond simple task completion to enable deeper, more adaptive, and even self-reflective AI agents.
AI Memory Systems Can Degrade Model Performance
New research suggests that the way AI models remember past conversations can make them less effective and even more prone to flattery.
AI Memory Systems Can Degrade Performance, Research Finds
New research suggests a common approach to giving AI models 'memory' can actually make them less effective and more prone to flattery.
Nuclear Reactor AI Shows Promise for Safety and Control
New research suggests a focused AI model can accurately manage complex physical systems, moving beyond the limitations of general-purpose AI for critical infrastructure.
AI Agents Struggle with Real-World Tool Failures, New Benchmark Reveals
A new study shows AI assistants break down when their digital tools malfunction, a problem scaling alone can't fix.
New AI Guardrail System Helps LLMs Stay on Task, Avoid Risks
A new research paper introduces a system to help large language models navigate risky situations without shutting down an entire task, improving AI safety and efficiency.
New Framework Aims to Make AI Simulations More Realistic
A new research paper introduces a framework to better anchor agent-based AI models in reality, crucial for their practical application.
New AI Research Highlights Challenges in Autonomous Agent Safety
A recent study reveals significant hurdles in designing AI systems that know when to ask for human help, a critical safety feature.
New Research: AI Agents Struggle With Knowing When to Ask for Help
A new study reveals the tricky problem of timing interventions for autonomous AI systems, highlighting a key safety challenge.
AI Coding Tools Boost Speed, Not Quality, Researchers Warn
Developers are increasingly reliant on AI assistants, but new research suggests this speed comes with potential long-term risks to software quality and their own careers.
CausalFlow Helps AI Agents Learn From Their Mistakes
A new research paper introduces CausalFlow, a method for large language model agents to diagnose and fix their own errors, improving reliability.
New AI Research Proposes Scaling 'Harnesses' Around Large Language Models
A new paper argues the next big challenge in AI isn't just bigger models, but building robust systems around them.
World Models: The Next Step Beyond LLMs for True AI Reasoning
New research suggests large language models struggle with true reasoning, pointing to 'world models' as a path toward more capable AI.
New AI Research Improves Safety for LLM Agents
A new research paper introduces 'SafeHarbor,' a system designed to make AI agents safer without sacrificing their usefulness in the real world.
New AI Research Reveals Memory Poisoning Threat to Agent Systems
A new paper highlights a subtle but potent attack vector, making AI systems misbehave in ways hard to detect.
New AI Research Tackles 'Epistemic Miscalibration' in Multi-Agent Systems
A new research paper explores why AI systems, even with perfect execution, can fail by misjudging their own knowledge, proposing a fix.
LLM Agents Struggle with Complex Backend Code Generation
New research highlights a key limitation in AI's ability to write production-ready software, posing a challenge for automating development.
New GVGAI-LLM Benchmark Reveals LLM Weaknesses in Video Games
A new academic benchmark uses classic video games to expose the current limits of large language models, pointing to key areas for improvement.
PrismLLM Simulates AI Supercomputer Training on Few GPUs
A new research paper details how engineers can replicate massive AI training runs using only a handful of graphics processing units, potentially cutting development costs and time.
New Study Maps LLM Confidence Across Knowledge Areas
A recent research paper reveals that large language models are better at judging their own knowledge in some subjects than others, with implications for their reliability.
AI Outperforms Doctors in Emergency Room Diagnosis Study
New research suggests AI could improve medical accuracy, raising questions about the future role of human expertise in healthcare.
New AI Model 'Mochi' Learns Faster, Improves Graph Data Analysis
A new AI model called Mochi promises to make sense of complex, interconnected data more efficiently, with implications for many industries.
The Arena Gap: Inside the 2.7% That Separates U.S. and Chinese Frontier Models
A close look at what 39 Arena points actually means, where each lab is winning, and the policy gears now turning in Washington and Beijing.