A Matrix Sees More Than You Do

A single weight matrix can turn a bland document embedding into a crisp list of how relevant it is to 'machine learning' versus 'history' versus 'philosophy', all without reading a single word. This feels impossible until you watch it happen.

1. Problem

You have thousands of document embeddings — long lists of numbers that capture the meaning of each wiki page. Your self-evolving knowledge LLM needs to answer: 'Which pages are mostly about transformers, which are about knowledge hierarchies, and which mix both?' Doing this by hand is impossible. You need an automatic way to scan every embedding and light up the exact semantic features it contains. That automatic scanner is a weight matrix that uses basis vectors to perform feature detection.

2. Concept

Think of every document embedding as a point floating in a high-dimensional space, just like the 2D maps you already know from the previous lesson on rotation matrices and scaling matrices. The standard basis is the usual set of directions: east for the first number, north for the second, and so on. Each direction is a basis vector.

A basis vector is like a single compass needle. Any point in the space can be reached by walking so many steps along each needle. That walk is called a linear combination — you scale each basis vector by a number and add them up.

Change of basis is simply swapping to a new set of compass needles that point toward the things you actually care about. Instead of "east", one new needle might point toward "machine-learning-ness". The weight matrix performs this change of basis through matrix-vector multiplication. After the change, the new coordinates tell you how strong each semantic feature is in that document.

This is feature detection: the weight matrix acts like a filter that measures how much of each new direction (each feature) is present in the original embedding. Because it is built from linear transformations, everything we learned about determinants and geometric transformations still applies — the matrix can stretch, rotate, or flip the space to make the features easier to read.

The diagram above shows how the same red point receives completely new numbers once we switch to a feature basis.

3. Minimal working example

typescript

// We represent a 2-dimensional embedding as a simple vector.// In real LLMs these vectors are often 768 or 4096 numbers long.const embedding: number[] = [0.8, 0.3]; // a document that is somewhat about ML
// Our weight matrix has two rows — each row is one new basis vector.// Row 0 = "ML topic" direction, Row 1 = "Hierarchy" direction.const weightMatrix: number[][] = [  [2.0, 0.5],  // how to combine original numbers to get ML score  [-0.4, 1.8]  // how to combine original numbers to get hierarchy score];
// matrix-vector multiplication performs the change of basisfunction detectFeatures(vec: number[], W: number[][]): number[] {  const result: number[] = new Array(W.length).fill(0);  for (let i = 0; i < W.length; i++) {           // for each new basis vector    for (let j = 0; j < vec.length; j++) {       // walk along the original coords      result[i] += W[i][j] * vec[j];              // linear combination step    }  }  return result;}
const featureScores = detectFeatures(embedding, weightMatrix);console.log("Detected features (ML, Hierarchy):", featureScores);// Output: [1.75, 0.38] — the page is 1.75× stronger in ML than in hierarchy

// We represent a 2-dimensional embedding as a simple vector.// In real LLMs these vectors are often 768 or 4096 numbers long.const embedding: number[] = [0.8, 0.3]; // a document that is somewhat about ML
// Our weight matrix has two rows — each row is one new basis vector.// Row 0 = "ML topic" direction, Row 1 = "Hierarchy" direction.const weightMatrix: number[][] = [  [2.0, 0.5],  // how to combine original numbers to get ML score  [-0.4, 1.8]  // how to combine original numbers to get hierarchy score];
// matrix-vector multiplication performs the change of basisfunction detectFeatures(vec: number[], W: number[][]): number[] {  const result: number[] = new Array(W.length).fill(0);  for (let i = 0; i < W.length; i++) {           // for each new basis vector    for (let j = 0; j < vec.length; j++) {       // walk along the original coords      result[i] += W[i][j] * vec[j];              // linear combination step    }  }  return result;}
const featureScores = detectFeatures(embedding, weightMatrix);console.log("Detected features (ML, Hierarchy):", featureScores);// Output: [1.75, 0.38] — the page is 1.75× stronger in ML than in hierarchy

Every line above is commented. The only new idea is that each row of the weight matrix is a basis vector defining a new direction we want to measure.

4. Example breakdown

embedding is the input vector coming from your document encoder. We treat it as coordinates in the standard basis.
weightMatrix stores the new basis vectors as its rows. The first row tells the computer "walk 2.0 east and 0.5 north to reach one unit of ML-ness".
The nested loops are exactly matrix-vector multiplication. The outer loop picks which feature we are computing; the inner loop builds the linear combination of the original coordinates.
The returned array featureScores gives the coordinates of the same point but measured in the new feature basis. High numbers mean the document strongly contains that semantic feature.

This is geometrically identical to rotating or scaling the space we explored in the previous lesson, except the new axes now carry meaning.

Adjust the sliders on the right to see feature detection in action. The bars show how strongly each semantic feature is present.

5. Extended example

Now we add a third feature and use real-looking numbers for a 4-dimensional embedding. This is closer to what a small LLM wiki would actually use.

typescript

// 4-dim embedding coming from our document encoderconst docEmbedding = [0.92, 0.41, -0.15, 0.67];
// 3×4 weight matrix — each row is a new basis vector for one semantic featureconst featureDetector = [  [ 1.8,  0.6, -0.3,  0.9], // ML / Transformers strength  [-0.2,  1.9,  0.8, -0.4], // Knowledge hierarchy strength  [ 0.5, -0.7,  2.1,  1.2]  // Philosophy / ethics strength];
const scores = detectFeatures(docEmbedding, featureDetector);console.log("Feature vector:", scores.map(s => s.toFixed(3)));// → ["2.614", "0.731", "1.845"]
// The highest number tells us the dominant semantic feature.const dominant = scores.indexOf(Math.max(...scores));console.log("Dominant feature index:", dominant); // 0 = Transformers

// 4-dim embedding coming from our document encoderconst docEmbedding = [0.92, 0.41, -0.15, 0.67];
// 3×4 weight matrix — each row is a new basis vector for one semantic featureconst featureDetector = [  [ 1.8,  0.6, -0.3,  0.9], // ML / Transformers strength  [-0.2,  1.9,  0.8, -0.4], // Knowledge hierarchy strength  [ 0.5, -0.7,  2.1,  1.2]  // Philosophy / ethics strength];
const scores = detectFeatures(docEmbedding, featureDetector);console.log("Feature vector:", scores.map(s => s.toFixed(3)));// → ["2.614", "0.731", "1.845"]
// The highest number tells us the dominant semantic feature.const dominant = scores.indexOf(Math.max(...scores));console.log("Dominant feature index:", dominant); // 0 = Transformers

We simply extended the same linear combination to more dimensions. The weight matrix still performs a change of basis, only now we have three new compass needles instead of two.

Use the Prev/Next buttons to watch a linear combination being assembled one basis vector at a time.

6. Common mistakes

Treating the rows of the weight matrix as the final answers instead of directions. The numbers only make sense after the full matrix-vector multiplication.
Forgetting that order matters: the weight matrix must be on the left of the vector (W @ v). Switching them gives a meaningless result.
Assuming a determinant near zero means the detector is broken. A small determinant can actually be useful if you deliberately want to squash less-important directions.
Copy-pasting the same row twice in the weight matrix creates two identical basis vectors; the change of basis loses information and the feature detection becomes ambiguous.

Fix any of these by printing the shape of your matrix and the shape of your embedding before multiplying. The inner dimensions must match.

Think about it

How might the choice of which semantic features you encode in your weight matrix affect which documents your wiki decides to link together over time?

Real-World Application

Inside the self-evolving knowledge LLM wiki you are building, this exact pattern runs on every new document that arrives. The weight matrix lives inside a small feed-forward layer of the Transformer. Its rows become the basis vectors for hundreds of learned semantic features — not just three hand-crafted ones. When the system decides which pages should link to each other, it sorts by these detected feature strengths. Over time the matrix is updated through back-propagation, so the compass needles slowly rotate toward the directions that best separate high-quality wiki clusters from noisy ones. This is how a machine learns to "read" meaning geometrically without any explicit rules.

The same idea powers retrieval in vector databases: the change of basis turns raw embeddings into a space where nearest-neighbor search actually finds documents that share the same semantic features.

Quiz

Q1: What is a basis vector? A. A single number stored in a matrix
B. A direction in space that you can scale and add to reach any point
C. The result after multiplying two matrices
D. A special rotation that only works in 2D

Correct: B
A basis vector is a direction; any other vector is a linear combination of these directions.

Q2: What does change of basis accomplish? A. It deletes some numbers from the vector
B. It re-expresses the same geometric point using a different set of compass needles
C. It only works when the determinant is exactly 1
D. It converts a vector into a matrix

Correct: B
The coordinates change, but the underlying point stays the same.

Q3: In the context of document embeddings, what does feature detection mean? A. Counting how many words are in the document
B. Measuring how strongly each learned semantic direction is present
C. Rotating the embedding by 90 degrees
D. Storing the embedding in a database

Correct: B
The weight matrix projects the embedding onto new basis vectors tuned to topics like "machine learning" or "hierarchy".

Q4: Given the minimal example above, if you change the embedding to [1.0, 0.0] and keep the same 2×2 weight matrix, what are the new feature scores? A. [2.0, -0.4]
B. [1.8, 0.5]
C. [2.0, -0.4]
D. [0.8, 0.3]

Correct: A
Perform the matrix-vector multiplication: first row dot product gives 2.0 × 1.0 + 0.5 × 0.0 = 2.0; second row gives –0.4 × 1.0 + 1.8 × 0.0 = –0.4.

Q5: Why might two different weight matrices both have a determinant of 1.0 yet produce very different feature detections on the same set of embeddings? A. One of them must be a rotation matrix and the other a scaling matrix
B. The rows point in completely different semantic directions even though they preserve volume
C. The determinant has no relation to feature detection
D. Only integer matrices can be used for features

Correct: B
A determinant of 1.0 tells you the transformation preserves volume, but the actual orientation of the new basis vectors (the meaning of each feature) can be completely different.

Knowledge Check

Test your understanding

1 / 5

What is a basis vector?

Up next

Squashing data with Projections

Implementing the projection formula to see how one vector can be 'mapped' onto another, mimicking the core of attention scores.

Pending