AI Alignment Research

1 年

Exclusive: New Research Shows AI Strategically Lying

Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit ...

2 年

An AI Pause Is Humanity’s Best Bet For Preventing Extinction

Constantly improving AI would create a positive feedback loop: an intelligence explosion. We would be no match for it.

TechCrunch

OpenAI’s research on AI models deliberately lying is wild

Every now and then, researchers at the biggest tech companies drop a bombshell. There was the time Google said its latest quantum chip indicated multiple universes exist. Or when Anthropic gave its AI ...

Computer Weekly

UK AI alignment project gets OpenAI and Microsoft boost

OpenAI and Microsoft are the latest companies to back the UK’s AI Security Institute (AISI). The two firms have pledged support for the Alignment Project, an international effort to work towards ...

ZDNet

AI models know when they're being tested - and change their behavior, research shows

Several frontier AI models show signs of scheming. Anti-scheming training reduced misbehavior in some models. Models know they're being tested, which complicates results. New joint safety testing from ...

Geeky Gadgets

Ilya Sutskever Calls for Core AI Research : Scaling Won’t Take Us Further

What happens when the strategies that propelled an entire field to unprecedented heights begin to falter? For artificial intelligence, this is no longer a hypothetical question. After years of ...

来自MSN

Aligning those who align AI, one satirical website at a time

The work of creating artificial intelligence that holds to the guardrails of human values, known in the industry as alignment, has developed into its own (somewhat ambiguous) field of study rife with ...

TMCnet

Proving AI Value Is the Defining Test for IT Leadership, Says Info-Tech Research Group in ...

CIOs across the UK and Europe are entering 2026 under mounting pressure to demonstrate measurable business value from ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果