All videos
All videos
AI Alignment is More Fragile Than You Think (And Shaggoth is Lurking)
September 30, 2025
One of the fundamental features of LLMs is their ability to provide responses aligned with human ethical principles and intentions. In his talk, Przemek will explore what LLMs truly "know" about morals, discuss how models are being trained for AI Alignment, and reveal how (surprisingly) easy it is to break. He'll show experimental results, in which a small set of harmful examples could derail alignment. A live demo and open discussion on what it all means for us will follow.
Other videos that you might like
A story about connecting microservices
Paweł Kamiński
All you need to know to land and manage your first UX job
Monika Soja, Mira Melhor
How I gained from gracing the world with yet another NPM package
Mikołaj Klaman
The most important 5 minutes of a new leader
Dawid Ostręga