80,000 Hours podcast: Ajeya Cotra on accidentally teaching AI models to deceive us

“I don’t know yet what suite of tests exactly you could show me, and what arguments you could show me, that would make me actually convinced that this model has a sufficiently deeply rooted motivation to not try to escape human control. I think that’s, in some sense, the whole heart of the alignment problem.”

Topics: