A Matrix Sees More Than You Do
A single weight matrix can turn a bland document embedding into a crisp list of how relevant it is to 'machine learning' versus 'history' versus 'philosophy', all without reading a single word. This feels impossible until you watch it happen.
1. Problem
You have thousands of document embeddings — long lists of numbers that capture the meaning of each wiki page. Your self-evolving knowledge LLM needs to answer: 'Which pages are mostly about transformers, which are about knowledge hierarchies, and which mix both?' Doing this by hand is impossible. You need an automatic way to scan every embedding and light up the exact semantic features it contains. That automatic scanner is a weight matrix that uses basis vectors to perform feature detection.
2. Concept
Think of every document embedding as a point floating in a high-dimensional space, just like the 2D maps you already know from the previous lesson on rotation matrices and scaling matrices. The standard basis is the usual set of directions: east for the first number, north for the second, and so on. Each direction is a basis vector.
A basis vector is like a single compass needle. Any point in the space can be reached by walking so many steps along each needle. That walk is called a linear combination — you scale each basis vector by a number and add them up.
Change of basis is simply swapping to a new set of compass needles that point toward the things you actually care about. Instead of "east", one new needle might point toward "machine-learning-ness". The weight matrix performs this change of basis through matrix-vector multiplication. After the change, the new coordinates tell you how strong each semantic feature is in that document.
This is feature detection: the weight matrix acts like a filter that measures how much of each new direction (each feature) is present in the original embedding. Because it is built from linear transformations, everything we learned about determinants and geometric transformations still applies — the matrix can stretch, rotate, or flip the space to make the features easier to read.