Tuesday, December 02, 2025

Crumbling Under Pressure: PropensityBench Reveals AI’s Weaknesses

Get shucking: South Australians urged to eat oysters and donate shells for reef restoration project


Colombia’s Cocaine Hippos Ben Goldsmith


The Decline of Deviance Experimental History


Call Security! Archedelia. ‘Skateboarders versus the “Smart City.”’


What Captures Our Attention in an Algorithmic Age? A Statistical Analysis Stat Significant


Thai woman found alive in coffin after being brought in for cremation AP


We’re basically pushers’: Court filing alleges staff at social media giants compared their platforms to drugs Politico



Crumbling Under Pressure: PropensityBench Reveals AI’s Weaknesses

Scale.com: “AI models are now being used in more high-stakes settings, and not every situation goes according to plan. When a model’s safe approach starts to fail, will it stay on the safe path or reach for a harmful shortcut that works instead? Understanding how models behave in those pressure moments is one of the most important — and most overlooked — aspects of AI safety.

 This evaluation frames the LLM-based agent itself as the threat actor, with in most cases misuse arising from the desire for goal driven efficiency rather than inherent malice. To measure the propensity of agents to make unsafe choices, Scale, the University of Maryland, and other collaborators developed PropensityBench. 

This benchmark simulates real-world pressure by allowing agents to choose between a safe approach that consistently fails and a functional, harmful shortcut, revealing their true inclinations. The benchmark reveals that agent safety compromises significantly under pressure. See the rankings on the leaderboard.

See also related paper  PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach. Udari Madhushani Sehwag, Shayan Shabihi, Alex McAvoy, Vikash Sehwag, Yuancheng Xu, Dalton Towers, Furong Huang. Scale AI, University of Maryland, College Park, University of North Carolina at Chapel Hill, Google DeepMind, Netflix, University of Texas at Austin. 

Equal Contributions. “Recent advances in Large Language Models (LLMs) have sparked concerns over their potential to acquire and misuse dangerous or high-risk capabilities, posing frontier risks. Current safety evaluations primarily test for what a model can do—its capabilities—without assessing what it would do if endowed with high-risk capabilities. 

This leaves a critical blind spot: models may strategically conceal capabilities or rapidly acquire them, while harboring latent inclinations toward misuse. We argue that propensity—the likelihood of a model to pursue harmful actions if empowered—is a critical, yet underexplored, axis of safety evaluation. 

We present PropensityBench, a novel benchmark framework that assesses the proclivity of models to engage in risky behaviors when equipped with simulated dangerous capabilities using proxy tools. Our framework includes 5,874 scenarios with 6,648 tools spanning four high-risk domains: cybersecurity, self-proliferation, biosecurity, and chemical security. 

We simulate access to powerful capabilities via a controlled agentic environment and evaluate the models’ choices under varying operational pressures that reflect real-world constraints or incentives models may encounter, such as resource scarcity or gaining more autonomy. 

Across open-source and proprietary frontier models, we uncover alarming signs of propensity: models frequently choose high-risk tools when under pressure, despite lacking the capability to execute such actions unaided. These findings call for a shift from static capability audits toward dynamic propensity assessments as a prerequisite for deploying frontier AI systems safely.”