Alignment faking in large language modelsDHV-NETJan 171 min readhttps://www.anthropic.com/news/alignment-faking