A Reverse Engineer’s Guide to Mechanistic Interpretability
I recently took the stage at DEF CON 32 to introduce the cutting-edge field of Mechanistic Interpretability to the reverse engineering community. This fascinating area, which focuses on decoding the inner workings of AI systems, bridges the gap between traditional reverse engineering and the latest advancements in artificial intelligence. Below I’ve included the abstract from the talk along with the slides.
You can view the video of the talk on the home page of AICyberChallenge.com, under the heading “Day 2: Dr Andrew Fasano”.
Abstract
While the world buzzes about AI-augmented reverse engineering, what about turning the tables and reverse engineering AI itself? As artificial intelligence systems grow increasingly complex and pervasive, decoding their inner workings has become not just a fun challenge, but a critical necessity. This talk introduces the emerging field of mechanistic interpretability to the reverse engineering community, revealing how the frontier of AI research is reinventing wheels long familiar to RE experts. We’ll explore how traditional reverse engineering techniques are finding new life in dissecting neural networks, and why the RE community’s hard-earned wisdom is more relevant than ever in the age of AI.
The presentation will demystify key concepts in mechanistic interpretability such as features, circuits, and superposition, mapping them onto familiar RE paradigms. Attendees will gain insights into:
- The parallels between reverse engineering software and decoding AI systems
- Current challenges in mechanistic interpretability
- The golden opportunities for reverse engineers to contribute to this critical field and potentially reshape the future of AI safety
This talk aims to spark a cross-pollination between reverse engineering and AI research communities. Whether you’re a seasoned reverse engineer itching for a new challenge, or an AI researcher seeking fresh perspectives, prepare to view artificial intelligence through a new lens.