Rotating and Stretching Data Spaces

A single 2-by-2 matrix can rotate the embedding of the word "machine learning" by 90 degrees and simultaneously stretch its distance from the origin by 3 times, instantly moving it closer in vector space to wiki entries like "neural network" and "deep learning".

Problem

You have built a small wiki inside an LLM where each concept such as "machine learning" lives as an embedding (a vector). You want to move this embedding toward related ideas without manually editing every number. A learned weight matrix should do the heavy lifting, acting like a gentle push that rotates and stretches the idea in the right direction so it sits near its neighbors. This is the exact job linear transformations perform inside every Transformer layer.

Concept

Think of a vector as an arrow on a city map. A geometric transformation is like applying a rule that tells every arrow where to point next. One common tool is a rotation matrix. It spins every arrow by a fixed angle without changing its length. Another tool is a scaling matrix. It stretches or shrinks arrows along each direction, like stretching a rubber sheet in one way but not another.

When you multiply a matrix by a vector you get matrix-vector multiplication. The result is the new position of the arrow after the transformation. The determinant of the matrix tells you how much the whole map has been stretched or squashed overall. A determinant of 2 means every area became twice as large. A negative determinant means the map was also flipped over.

All of these together let a single weight matrix act as a learned nudge that moves embeddings of related wiki entries closer together in the hidden space.

This live simulation shows exactly how a rotation matrix and scaling matrix combine into one geometric transformation. Drag the sliders and watch the input vector (our "machine learning" embedding) swing around while the length changes. The determinant updates instantly so you can see whether the space is being enlarged or shrunk.

Minimal working example

Below is a tiny TypeScript function that applies a transformation to a vector. Every line is commented.

// A 2D vector representing the embedding of "machine learning"const embedding: number[] = [3.0, 1.0];
// A rotation matrix that spins 90 degrees counterclockwise.// We built matrices from scratch in the previous lesson.const rotationMatrix: number[][] = [  [0, -1],  // first row  [1,  0]   // second row];
// A scaling matrix that doubles length along both axes.const scalingMatrix: number[][] = [  [2, 0],  [0, 2]];
// matrix-vector multiplication: newEmbedding = rotationMatrix * embedding// We reference matrix and vector from earlier lessons.function matrixVectorMultiply(matrix: number[][], vec: number[]): number[] {  const rows = matrix.length;  const cols = vec.length;  const result: number[] = new Array(rows).fill(0);  for (let i = 0; i < rows; i++) {    for (let j = 0; j < cols; j++) {      result[i] += matrix[i][j] * vec[j];  // dot product of row with vector    }  }  return result;}
// Apply the rotation then the scaling. Order matters.const rotated = matrixVectorMultiply(rotationMatrix, embedding);const transformed = matrixVectorMultiply(scalingMatrix, rotated);
console.log("Transformed embedding:", transformed);// Output: [-2, 6]  (rotated 90° then stretched by 2)

// A 2D vector representing the embedding of "machine learning"const embedding: number[] = [3.0, 1.0];
// A rotation matrix that spins 90 degrees counterclockwise.// We built matrices from scratch in the previous lesson.const rotationMatrix: number[][] = [  [0, -1],  // first row  [1,  0]   // second row];
// A scaling matrix that doubles length along both axes.const scalingMatrix: number[][] = [  [2, 0],  [0, 2]];
// matrix-vector multiplication: newEmbedding = rotationMatrix * embedding// We reference matrix and vector from earlier lessons.function matrixVectorMultiply(matrix: number[][], vec: number[]): number[] {  const rows = matrix.length;  const cols = vec.length;  const result: number[] = new Array(rows).fill(0);  for (let i = 0; i < rows; i++) {    for (let j = 0; j < cols; j++) {      result[i] += matrix[i][j] * vec[j];  // dot product of row with vector    }  }  return result;}
// Apply the rotation then the scaling. Order matters.const rotated = matrixVectorMultiply(rotationMatrix, embedding);const transformed = matrixVectorMultiply(scalingMatrix, rotated);
console.log("Transformed embedding:", transformed);// Output: [-2, 6]  (rotated 90° then stretched by 2)

Example breakdown

We start with an embedding that lives at coordinates (3, 1). This is a concrete vector.

The rotationMatrix turns every vector 90 degrees counterclockwise. Its columns are the new basis vectors after rotation. Because we have already covered linear transformation, we know this matrix represents a pure rotation.

The scalingMatrix stretches every direction equally. Multiplying by 2 makes the map twice as big everywhere.

matrixVectorMultiply does the actual work. For each row of the weight matrix it computes the weighted sum of the input components. This is exactly how a neural network layer transforms an embedding.

We apply the rotation first, then the scaling. Changing the order would give a different final position, showing that matrix multiplication is not commutative.

The final numbers [-2, 6] tell us the "machine learning" idea has now been moved to a place twice as far from the origin and rotated toward the positive y-axis, closer to the embeddings of related wiki pages.

This step-through lets you cycle through four different matrices while keeping the same input embedding fixed. Watch how a pure rotation matrix, a pure scaling matrix, a combined transformation, and a realistic learned weight matrix each move the vector to a completely different place. The numbers update at every click so you can connect the algebra to the geometry.

Extended example

Now we make the function more realistic for a wiki. We combine a rotation, scaling, and a small learned nudge that mimics a weight matrix inside an attention layer.

// Our starting embedding for "machine learning"let wikiEmbedding = [3.0, 1.0];
// Learned transformation that also references the idea of an embedding// rotation by ~30°, modest stretch, and a slight shear (typical of learned weight matrices)const learnedTransform: number[][] = [  [1.4, -0.6],  [0.8,  1.3]];
function applyWikiTransform(embedding: number[], transform: number[][]): number[] {  return matrixVectorMultiply(transform, embedding);}
// Move the embedding toward related wiki entriesconst newPosition = applyWikiTransform(wikiEmbedding, learnedTransform);console.log("New position after wiki transformation:", newPosition);// Example output: [3.6, 4.1] — now closer to embeddings of "neural network" (4.2, 3.9)

// Our starting embedding for "machine learning"let wikiEmbedding = [3.0, 1.0];
// Learned transformation that also references the idea of an embedding// rotation by ~30°, modest stretch, and a slight shear (typical of learned weight matrices)const learnedTransform: number[][] = [  [1.4, -0.6],  [0.8,  1.3]];
function applyWikiTransform(embedding: number[], transform: number[][]): number[] {  return matrixVectorMultiply(transform, embedding);}
// Move the embedding toward related wiki entriesconst newPosition = applyWikiTransform(wikiEmbedding, learnedTransform);console.log("New position after wiki transformation:", newPosition);// Example output: [3.6, 4.1] — now closer to embeddings of "neural network" (4.2, 3.9)

The learnedTransform matrix was not hand-crafted. In a real system it would be updated during training so that embeddings of related concepts naturally drift toward one another. This is the geometric heart of how Transformers organize knowledge.

Common mistakes

Beginners often forget that matrix-vector multiplication is not the same as ordinary multiplication. Writing matrix * vector with the * operator in JavaScript will not work; you must loop over rows and columns. The fix is to keep the matrixVectorMultiply helper from the minimal example.

Another frequent error is mixing row-major and column-major order. If your rotation looks wrong, check whether the matrix rows or columns represent the new basis vectors. Printing the determinant quickly reveals the mistake: a value near zero means the matrix has collapsed the space into a line and you will lose information.

Finally, people sometimes apply the scaling before the rotation when the model expects the reverse order. The quickest way to spot this is to run the live simulation above with the same numbers and compare where the arrow ends up.

These small bugs become visible immediately on a 2D canvas, which is why geometric intuition helps debug models before they ever touch a GPU.

Think about it

How might changing the determinant of a transformation matrix affect the relationships an LLM learns between wiki concepts?

Real-World Application

Inside a vector database that powers your self-evolving knowledge LLM, each new wiki article arrives as a fresh embedding. A single learned weight matrix can rotate and stretch the entire set of embeddings so that "machine learning" naturally sits between "neural network" and "gradient descent" without any extra code. The determinant of that matrix tells you whether the transformation is preserving distances (determinant near 1) or deliberately expanding certain directions to emphasize rare concepts. When you train the next layer, these geometric transformations compound, letting the model discover higher-level relationships exactly as Andrej Karpathy describes in his recent explorations of self-organizing knowledge systems. You can now experiment with these ideas in a few lines of TypeScript before you ever touch a full Transformer.

This lesson gives you the concrete visual and coding foundation you need before the next lesson on detecting features via basis vectors, where we will see how a matrix can automatically discover which directions in embedding space carry the most meaning.

Knowledge Check

Test your understanding

1 / 5

What does a rotation matrix do to a vector?

Up next

Detecting features via Basis Vectors

Understanding how a matrix transformation looks for specific patterns or 'features' in the input data.