AI Alignment is More Fragile Than You Think (And Shaggoth is Lurking)

September 30, 2025

One of the fundamental features of LLMs is their ability to provide responses aligned with human ethical principles and intentions. In his talk, Przemek will explore what LLMs truly "know" about morals, discuss how models are being trained for AI Alignment, and reveal how (surprisingly) easy it is to break. He'll show experimental results, in which a small set of harmful examples could derail alignment. A live demo and open discussion on what it all means for us will follow.