Blog
Notes on my research, and whatever else is worth writing down carefully.
-
Your Evals Will Break and You Won't See It Coming
Why LLM evaluations may fail exactly when models undergo qualitative phase transitions.