Alignment faking in large language modelsDHV-NETJan 17, 20251 min readhttps://www.anthropic.com/news/alignment-faking