Llama-3 8b Simple Math Tie-Dye Activation Analysis on x + y dataset

xyUMAP20240709

See notebook understanding_understanding/results/all_results.ipynb

Now with class-relevance PCA from Cameron

Generic PCA

PCA + Umap -- Cameron Soul Read, shows clear separation by answer in deeper layers

We've also just now achieved much better clustering results with a slight change in our data visualization approach:

  1. First, as before, we perform PCA on the class means (where each class = different color). This tells us the major dimensions along which class differences vary.
  2. We take the top (num_classes - 1) dimensions of this PCA.
  3. We perform UMAP on the result.

This approach is designed to maximally ignore spurious variation in the embeddings and focus only the variation which is relevant for the output, yielding crisp, clean, clusters.

Interestingly, layers 29-30 seem to be the most clustered, and layer 32 is not as cleanly separated! We don't quite know the reason for this, but speculate that it may be due to the influence of unrelated continuations (ie. x + y = ____, x + y = ?, etc.)